Process for determining target function and identifying drug leads

ABSTRACT

The present invention relates to methods for using chemical ligands to determine target function and identify drug leads.

BACKGROUND OF THE INVENTION 1. INTRODUCTION

The present invention relates to a method of exposing targets to aplurality of potential ligands, collecting ligand-target pairs, usingthe ligand to analyze the target's biological function, and optionallyidentifying the ligand chemically and/or structurally. In one embodimentof the invention ligands are selected which bind to pharmaceuticallyrelevant targets. In another embodiment of the invention, ligand-targetpairs are collected and analyzed on a genomic scale. The inventionfurther relates to a method of screening a plurality of potentialligands in at least one bioassay for a change in phenotype and using thehit(s) to identify the corresponding molecular target.

2. BACKGROUND OF THE INVENTION

2.1. Traditional Approach to Drug Discovery

In general drugs discovered in the last 50 years are based on a fewhundred targets and there are presently about 450 validated targets usedfor screening by all of the pharmaceutical companies combined. Thesetargets have typically been developed using the traditional approach todrug discovery in which the target is validated using reductionistbiology including gene over-expression, gene knockout, gene sequencehomology searching for functional domains, x-ray crystallography, orspecific cellular and biological assays. Furthermore in drug discoveryas it is practiced today, target validation, assay development, highthroughput screening and lead generation are performed in series.

2.2. Genomics

The large number of uncharacterized genes from the completion of thesequencing of the human genome makes it difficult but essential for apharmaceutical company to validate and choose only the right target tounleash the value of the human genome sequence. It is estimated that ofthe 100,000 or more genes in the human genome, at most 10,000 of thesegenes will be pharmaceutically useful targets. This huge number of genesis overwhelming the reductionist approach to gene validation therebypresenting a major bottleneck in drug discovery.

The accumulating mass of DNA sequence data has given rise to the fieldof functional genomics that promises to alleviate the bottleneck. Geneexpression profiling can be studied using DNA arrays (De Risi J L etal., 1997, Science 278; 680). Protein expression profiling can beperformed using protein arrays (Paweletz C P et al., 2000, Drug Dev.Research 49:34). Gene function can be studied by the introduction ormutation of a gene to induce a conditional change in phenotype.Alternatively, an antisense or ribozyme version of a gene may beexpressed in a variety of cell lines or organisms including transgenicor knockout mice, C. elegans, zebra fish, Drosophila or yeast (Couture LA et al., 1996, Trends in Genetics 12:510; Nadeau J H et al., 1998,Curr. Opin. Genet. Dev. 8, 311).

Differential gene expression can be detected using a variety oftechniques including: differential screening (Tedder T F et. al. 1988PNAS 85:208), subtractive hybridization (Hedrick S M et. al. 1984,Nature 308:149), differential display (Liang P and Pardee A 1993 U.S.Pat. No. 5,262,311), gene microarray (Lockhart, D et al., 1996, NatureBiotechnology 14:1675; Schena M et. al., 1995, Science 270: 467; 2000,Nature Genetics 24:236), representational difference analysis (Hubank Met al., 1994, Nucleic Acids Research 22:5640), large scale sequencing ofexpressed sequence tags (EST's), reverse transcriptase PCR, serialanalysis of gene expression (SAGE; Nacht M et al., 1999, Cancer Res.59:5464) and laser capture microdissection (Sgroi D C et al., 1999,Cancer Research 59:5656). Microarray technology represents the currentstate of the art for genomics and has been used to study cell cycles,biochemical pathways, genome wide expression in yeast, cell growth, celldifferentiation, cell responses to a single compound, genetic diseases(M. Schena, 1998, TIBTECH 16:301).

2.3. Identification and Characterization of Protein Targets

Using classical biochemical techniques, previously unknown receptors forsmall molecules have been identified at the protein level using in vitrobiochemical methods including photo-crosslinking, radiolabeled ligandbinding and affinity chromatography (Jakoby W B et al., 1974, Methods inEnzymology 46:1). These methods require purification of the protein. Inorder to clone the gene for the receptor, the peptide must be furthersequenced and this sequence used to clone the cDNA for the protein.Small molecules can be radiolabeled and used to determine the moleculartarget (Kwon H J et. al., 1998, PNAS 95:3356). Alternatively, smallmolecules can be immobilized on an agarose matrix and used to screenextracts of a variety of cell types and organisms. For example,purvalanol B (a known inhibitor of cyclin-dependent kinases) wasimmobilized on an agarose matrix and used to screen extracts from adiverse collection of cell types and organisms and a number of proteinswith kinase activity were isolated (Knockaert M et. al., 2000, Chem.Biol. 7:411). Alternatively, trapoxin is a cyclotetrapeptide thatinhibits histone deacetylation and arrests the cell cycle. Two nuclearproteins co-purified with histone deacetylase activity from fractionatedcell extracts on an affinity matrix covalently modified with trapoxin.Subsequently the proteins were sequenced and cDNAs encoding the proteinswere cloned from a cDNA library (Taunton J et al., 1996, Science272:408).

Currently, the primary system for studying protein-protein interactionsis the yeast two hybrid system. In this approach, one protein is fusedto the DNA binding domain and another protein is bound to the DNAactivation domain of a eukaryotic transcription factor and expressed inthe presence of a reporter gene which allows the yeast to grow. If thetwo heterologous proteins bring the two domains together, then the yeastcontaining the proteins which interact are selected by growth (Fields Set al., 1989, Nature 340:245).

A yeast “three hybrid” transcription activation system has been used toclone a gene encoding a previously identified receptor for the drugFK506. This three hybrid system displays an anchored derivative of theactive ligand against a library of cDNAs fused to the transcriptionalactivation domain (Borchardt A. et al., 1997, Chem. Biol. 4:961; LicitraE J et al., 1996, PNAS 93:12817). In Licitra et al., the hormone bindingdomain of the rat glucocorticoid receptor was fused to the Lex A DNAbinding domain, a cDNA encoding the FK506 receptor (FKBP12) was fused tothe transcriptional activation domain and the two were expressed in theyeast two hybrid system. The yeast cells were plated on mediumcontaining a heterodimer of covalently linked dexamethasone and FK506and the cells grew in a way that may be inhibited by undimerized FK506.When the experiment was repeated with a cDNA expression library fused tothe transcriptional activation domain in place of the cDNA encodingFK506 binding protein, the yeast which grew contained cDNA clonesencoding the FK506 binding protein. However, this experiment was doneusing a chemical interacting with an known target. In Borchardt A etal., yeast cells in the presence of a FKBP12-GAL4 DNA binding domainfusion, the FR domain of the FK506 binding protein rapamycin associatedprotein, and rapamycin transcribe the HIS3 3 reporter genes allowing thecells to grow in the absence of histidine (Borchardt A et al., 1997,Chem Biol 4:961).

Expression cloning can be used to test for the target within a smallpool of proteins (King R W et. al., 1997, Science 277:973). Peptides(Kieffer et. al., 1992, PNAS 89:12048), nucleoside derivatives(Haushalter K A et. al., 1999, Curr. Biol. 9:174), and drug-bovine serumalbumin (drug-BSA) conjugate (Tanaka et. al., 1999, Mol. Pharmacol.55:356) have been used in expression cloning.

Another useful technique to closely associate ligand binding with DNAencoding the target is phage display. In phage display, which has beenpredominantly used in the monoclonal antibody field, peptide or proteinlibraries are created on the viral surface and screened for activity(Smith G P, 1985, Science 228:1315). Phage are panned for the targetwhich is connected to a solid phase (Parmley S F et al., 1988, Gene73:305). One of the advantages of phage display is that the cDNA is inthe phage and thus no separate cloning step is required. Dyax has used aphage display affinity column to isolate macromolecules but not smallmolecules (US97/04425).

Recently, Sche et al. used the natural product FK506 as an affinityprobe to clone FKBP12 from a T7 cDNA phage display library. They used anaffinity matrix bearing biotinylated FK506 to screen a phage libraryprepared with human brain cDNA. The phage particles remaining after tworounds of affinity selection shared a common 450 bp insert whichcorresponded to full length FKBP12.

Alternatives to phage display include plasmid display (Cull et al.,1992, PNAS 89:1865; Schatz P J et al., 1996, Methods Enzymol 267:171),polysome display (Mattheakis L C et al., 1996, PNAS 91:9022; MattheakisL C, 1996, Methods Enzymol 267:195), protein tagging (Whitehom E A etal., 1995, Biotechnology 13:1215), ribosome display (Hanes J et al.,1998, PNAS 95:14130), and cell surface display in bacteria andeukaryotes (Georgiou G et al., 1997, Nat. Biotechnol 15:29; Chesnut Jet. al., 1996, J. 1 mm Methods 193:17). Peptides or proteins can also belinked chemically via puromycin to the mRNA that encodes it (Roberts Ret al., 1997, PNAS 94: 12297).

2.4. Chemical Genetics

Chemical genetics is a new and potentially powerful approach to defininggene function through the use of chemicals to cause a conditional changein gene expression or gene function. However, to date, it has notadvanced far from traditional drug discovery using traditional highthroughput cell based screening assays against known targets to whichdrugs are already available to find more hits to those targets. Thecurrent status of chemical genetics is demonstrated in the work ofHaggarty S J et. al. (2000, Chem Biol 7:275) in which 139 compounds wereidentified from a high throughput screen of the Chembridge Diversetlibrary for inhibition of mitosis in a cell based assay and then assayedin an in vitro tubulin polymerzation assay. Of the 139 compounds, 52were antagonists which destabilized tubulin by the same mechanism ascolchicines. One compound was demonstrated to be an agonist whichstabilized tubulin by the same mechanism as taxol. 86 compounds had noeffect and thus likely modulated mitosis via non-tubulin targets. Forthe compounds targeting non-tubulin targets based upon visible effectson the chromosomes and cytoskeleton, 7 were believed to be weakantagonists of tubulin and one (monasterol) was demonstrated to inhibitthe kinesin-related protein Eg5 (Mayer et. al., 1999, Science 286:971).In the case of Haggarty S J et al., low affinity ligands were selectedsince assays were performed using a ligand concentration of 20 to 50 μM.However, low affinity ligands are of limited value in determining targetfunction.

Rosania G R et. al. identified a novel small molecule, myoseverin, by acell morphological screen which binds to tubulin to induce thereversible fission and proliferation of muscle cells. Unlike the currentinvention, Schulz is relying on the standard functional genomics DNAarray approach to understand the mechanism (Rosania G R et. al., 2000,Nat Biotechnol 18:304). Chemicals have been used to study function sincecolchicines were shown to have an effect on mitosis in 1889 (Eigsti O,1949, Science 110:692). However, current practice is limited toidentifying ligands which bind to known targets or to unidentifiedtargets which result in a particular phenotype.

Previous efforts to characterize the function of unknown genes areexemplified by orphan receptor analysis. Orphan receptors are encoded bygenes which share DNA sequence similarity with previously identifiedreceptors. On that basis, such sequences are placed into a receptorsuperfamily for which the natural physiological role and ligand areunknown. The present state of the art is to use genetic techniques or touse drugs or protein ligands known to bind to other members of thefamily to determine their function (Werme M et. al., 2000, Brain Res863:112; Bordji K. et. al., 2000, J. Biol. Chem. 275:12243; Yang C.,1999, Cancer Res. 59:4519; Chiou L, 1999, Br. J. Pharmacol 128:103;Williams C, 2000, Curr. Opinion in Biotechnology 11:42).

2.5. Chemical Target Characterization

Once a target is validated, two major screening categories are applied:bioassays and mechanism based assays (Gordon et. al., 1994, J. Med.Chem. 37:1386). Bioassays measure an effect on a cell of the compoundsbeing screened on viability or metabolism. For example, penicillin wasdiscovered by its growth inhibition in bacterial culture. Mechanismbased assays include biochemical assays measuring an effect on enzymaticactivity, cell based assays in which the target and a reporter system(e.g., luciferase or β-galactosidase) have been introduced into a cell(Monks A et. al., 1997, Anticancer Drug Des. 12: 533), or bindingassays. Binding assays can be performed with the target fixed to a well,bead (Boswoth N et al., 1989, Nature 1989, 341:167; Meldal M, 1994, PNAS91, 3314) or chip (Sunberg S, 2000, Curr. Opin. In Biotechnol 11:47) orcaptured by an immobilized antibody, and the bound ligands are detectedusually using calorimeter or by measuring fluorescence (Sunberg S, 2000,Curr. Opin. In Biotechnology 11:47).

In some newer binding assays, molecules binding to a target of knownfunction have also been resolved by capillary electrophoresis (U.S. Pat.No. 5,783,397; US99/15458). In other new assays, libraries wereweight-coded and deconvoluted using mass spectroscopy (Carell T et al.,1995, Chem Biol. 2: 171; Fang A S et. al., 1998, Comb Chem HighThroughput Screen 1:23; US 99/23837; US99/00024). HPLC has also beenused with mass spectroscopy to characterize combinatorial library purityand to analyze metabolites in plasma samples (Korfmacher W A et al.,1999, Rapid Commun Mass Spectrom 13:1991; Zeng L et al., 1998, Comb ChemHigh Throughput Screen 1:101; Nedved M L et al., 1996, Anal Chem 68:4228; Zimmer D et al., 1999, J. Chromatogr A 854:23; Aubagnac J L, CombChem High Throughput Screen 2:289).

3. SUMMARY OF THE INVENTION

The present invention relates to the use of a target of unknown functionto select for small molecules from a chemical library which are thenused in an assay to determine the target's function. According to theinvention, members of the chemical library are mixed with the protein ina biochemical binding assay and those that bind are then (sequentiallyor in parallel) used in a in vitro or in vivo bioassay to determine thefunction of the gene by a change in a measurable phenotype in abiological or pathological condition.

Alternatively, the invention uses chemicals which induce a phenotypicchange in a bioassay to determine the identity of the target. Theinvention provides a method of screening a plurality of potentialligands in at least one bioassay, selecting ligands which produce achange in phenotype in a bioassay, and using the ligand to screencandidate targets to identify the particular target(s) responsible forthe altered phenotype.

The invention can be used to define the function of genes and tosimultaneously validate the drug target and generate a drug lead thusstreamlining the drug discovery process. The structure activityrelationship information provided by the parallel comparison of a largenumber of structurally diverse hits which bind to the target but havedifferent activities in phenotypic assays can be used to rapidlyoptimize the lead. Using the invention, the massive numbers of genesprovided by genomics can be systematically sorted and useful drugtargets can be validated and selected for a given disease.

The present invention is different from the art because the latterdescribes screening against a known target while the present inventiondoes not require any prior knowledge of target identity or function.Furthermore, the present invention does not absolutely require theconstraint of a predetermined subunit of a particular mass in theconstruction of its library. According to the invention, virtually anyligand library produced by combinatorial or noncombinatorial means maybe used. Non-limiting examples include chemical, peptide, naturalproduct, natural product-like, sugar or antibody libraries. Peptides andproteins can be made to cross the cell membrane using a sequence fromHIV TAT, HSV VP22 or Antennapedia peptides containing proteintransduction domains (Swartz S R et al., 2000, Trends in Cell Biology10:290). Libraries may consist of pools of ligands or may be collectionsof single ligands screened individually.

Accordingly, in one aspect, the invention features a method forselecting a candidate ligand which binds a target molecule. This methodinvolves contacting an in vitro sample including a target molecule witha library of candidate ligands under conditions that allow complexformation between the target molecule and one or more of the candidateligands. The complex is isolated, and one or more of the candidateligands are recovered from the complex. Additionally, one or morerecovered candidate ligands are identified.

In various embodiments of the above aspect, the target molecule is amolecule of unknown biological function or a molecule that has not beenpreviously validated as a drug target. In other embodiments, the libraryincludes at least two different chemical scaffolds or includes at least11 different compounds. In other embodiments, the complex is isolatedusing size exclusion or biphasic chromatography (e.g., chromatographyusing an internal surface reverse phase (ISRP), GFF, or GFFII resin). Inother embodiments, MS, IR, FTIR, NMR, and/or UV analysis is used toidentify the recovered candidate ligand. In yet other embodiments, themethod includes determining the mass to charge ratio of a parent peak, afragment peak, and/or an isotope peak in the mass spectrum of therecovered candidate ligand. In one embodiment, the method also includescontacting the sample with a competitor ligand known to bind the targetmolecule. This competitor may reduce the number of low affinitycandidate ligands that bind the target molecule, allowing the higheraffinity candidate ligands to be selected.

In another aspect, the invention features another method for selecting acandidate ligand which binds a target molecule. This method involvescontacting an in vitro sample including a first target molecule and asecond target molecule with a library of candidate ligands underconditions that allow complex formation between the first targetmolecule and one or more of the candidate ligands and allow complexformation between the second target molecule and one or more of thecandidate ligands. A first complex including the first target moleculebound to a candidate ligand and a second complex including the secondtarget molecule bound to a candidate ligand are isolated. One or more ofthe candidate ligands from the first complex and/or from the secondcomplex are recovered and identified. In one embodiment, the method alsoincludes contacting the sample with a competitor ligand known to bindthe first target molecule or the second target molecule.

Additionally, the invention provides various methods for determining thebiological function of a target molecule, such as a naturally ornon-naturally occurring protein, nucleic acid, carbohydrate, or otherorganic molecule. The methods may be used to determine the function of agene or a protein of interest, such as gene or protein that isupregulation or downregulated in a particular disease state or in thepresence of a particular biological stimuli (such as TNFα). The methodsmay also be used to identify therapeutically active compounds for thetreatment of a disease state.

In one such aspect, the invention provides a method for determining thebiological function of a target molecule. This method includescontacting an in vitro sample including a target molecule with a libraryof candidate ligands under conditions that allow one or more of thecandidate ligands to bind the target molecule. A candidate ligand whichbinds the target molecule is selected. The effect of the selectedcandidate ligand in a biological assay is measured, thereby determiningthe biological function of the target molecule. In various embodiments,target molecule is a molecule of unknown biological function or amolecule that has not been previously validated as a drug target. Inother embodiments, the target molecule is upregulated or downregulatedin a disease state, in the presence of a physiological stimulus (e.g., acytokine such as TNF), or during a specific cellular or biologicalprocess. In particular embodiments, the target molecule is upregulatedor downregulated during angiogenesis, differentiation, proliferation, orinsulin secretion. In one embodiment, the selected candidate ligand isidentified using a method such as MS, IR, FTIR, NMR, UV, or any otherappropriate method. In particular embodiments, the selected candidateligand increases the activity of the target molecule in the biologicalassay. For example, the candidate ligand may activate an activity of thetarget molecule (such as an enzymatic activity), promote the productionof the target molecule, increase the stability of the target molecule,alter the localization of the target molecule, or promote theassociation of the target molecule with another molecule. In otherembodiments, the selected candidate ligand decreases the activity of thetarget molecule in the biological assay. For example, the candidateligand may inhibit an activity of the target molecule, inhibit theproduction of the target molecule, decrease the stability of the targetmolecule, alter the localization of the target molecule, or inhibit theassociation of the target molecule with another molecule. Exemplarybiological assays include a throughput screen using a nontransfectedcell line, cell, tissue, or other biological system where the target isnot previously known. In other embodiments, the biological assayinvolves determining the effect of the selected candidate ligand on atissue from a organism having a disease or disorder or undergoing aspecific cellular or biological process in the presence or absence of aphysiological stimulus is measured, thereby determining the biologicalfunction of the target molecule. In one embodiment, the tissue is amammalian tissue, such as a human tissue.

Methods for crosslinking or reacting two or more ligands which bind thesame target molecule are also provided. These methods allow one or moretarget surfaces to promote or catalyze the reaction between two ligands.These methods may be used to screen a library of ligands to determinewhat ligands bind the target molecule and what products containing acombination of ligands bind the target molecule with the highestaffinity. The products may be used as lead compounds in the developmentof therapeutics or used to characterize the active site of the targetmolecule. Related methods may be used to crosslink or react two or moreligands which bind different target molecules. These methods may be usedto determine what target molecules interact with a target molecule ofinterest, thereby determining what molecules are in the same pathway asthe target molecule of interest.

In another aspect, the invention features a method for reacting two ormore ligands that bind a target molecule of interest. This methodinvolves contacting a cell or in vitro sample including a targetmolecule with a first ligand (e.g., a first ligand having a firstcrosslinker) and with a second ligand under conditions that allow thetarget molecule to bind both the first ligand and the second ligand andallow the first ligand or the first crosslinker to covalently bind thesecond ligand, thereby generating a product including the first ligandand the second ligand. In some embodiments, target molecule is amolecule of unknown secondary or tertiary structure. In otherembodiments, the location or the tertiary structure of the binding sitein the target molecule for the first ligand or the second ligand isunknown. In a particular embodiment, the affinity of the product for thetarget molecule is greater than the affinity of the first ligand or thesecond ligand for the target molecule. In another embodiment, theproduct is used for drug discovery or development, lead optimization, ordevelopment of an agricultural or environmental agent. In yet anotherembodiment, the target molecule promotes or catalyzes the reactionbetween the first and second ligands. In another embodiment, the firstligand is reacted with a crosslinker prior to being contacted with thetarget molecule. In yet another embodiment, the first ligand, the secondligand, and a crosslinker are reacted in the presence or absence of thetarget molecule. In preferred embodiments, the method also includesidentifying the products with the greatest affinity for the targetmolecule. For example, the method may also include (a) contacting an invitro sample including the target molecule with one or more productsunder conditions that allow complex formation between the targetmolecule and one or more products, (b) isolating the complex, (c)recovering one or more products from the complex, and (d) identifyingone or more recovered products.

In a related aspect, the invention features a method for selecting acandidate ligand which binds a target molecule. This method includescontacting an in vitro sample including a target molecule with a libraryof candidate ligands under conditions that allow complex formationbetween the target molecule and one or more candidate ligands. Thecomplex is isolated, and one or more candidate ligand are recovered fromthe complex. In a preferred embodiment, more than one candidate ligandis identified in this manner. A cell or in vitro sample including thetarget molecule is contacted with a first recovered ligand and a secondrecovered ligand. The contacting is conducted under conditions thatallow the target molecule to bind the first recovered ligand and thesecond recovered ligand and allow the first recovered ligand tocovalently bind the second recovered ligand, thereby generating aproduct including the first recovered ligand and the second recoveredligand that has an affinity for the target molecule that is greater thanthe affinity of the first recovered ligand or the second recoveredligand for the target molecule. In some embodiments, the method alsoincludes contacting an in vitro sample including the target moleculewith one or more products under conditions that allow complex formationbetween the target molecule and one or more products. The complex isisolated, and one or more products are recovered from the complex andidentified.

In another related aspect, the invention features another method forselecting a candidate ligand which binds a target molecule. This methodincludes contacting an in vitro sample including a target molecule witha library of candidate ligands under conditions that allow complexformation between the target molecule and more than one candidateligand. The complex is isolated, and more than one candidate ligand isrecovered from the complex. A first recovered ligand and a secondrecovered ligand are reacted, thereby generating a product including thefirst recovered ligand and the second recovered ligand that has anaffinity for the target molecule that is greater than the affinity ofthe first recovered ligand or the second recovered ligand for the targetmolecule. In preferred embodiments, the method also includes contactingan in vitro sample including the target molecule with one or moreproducts under conditions that allow complex formation between thetarget molecule and one or more products. The complex is isolated, andone or more products are recovered from the complex and identified.

In another aspect, the invention features a method for reacting twoligands that bind different target molecules. This method includescontacting a cell or in vitro sample including a first target moleculeand a second target molecule with a first ligand (e.g., a first ligandhaving a first crosslinker) and with a second ligand. The contacting isconducted under conditions that allow (i) the first target molecule tobind the first ligand, (ii) the second target molecule to bind thesecond ligand, and (iii) the first ligand or the first crosslinker tocovalently bind the second ligand, thereby generating a productincluding the first ligand and the second ligand. In one embodiment, thelocation or the tertiary structure of the binding site in the firsttarget molecule for the first ligand and/or the location or the tertiarystructure of the binding site in the second target molecule for thesecond ligand is unknown. In one embodiment, the generation of theproduct indicates that the first target molecule (e.g., a protein) andthe second target molecule (e.g., a protein) interact in vivo or arepart of the same biological pathway. In another embodiment, the productis used for drug discovery or development, lead optimization, ordevelopment of an agricultural or environmental agent. In yet anotherembodiment, one or both target molecules promote or catalyze thereaction between the first and second ligands. In another embodiment,the first ligand is reacted with a crosslinker prior to being contactedwith the target molecules. In yet another embodiment, the first ligand,the second ligand, and a crosslinker are reacted in the presence orabsence of the target molecules.

In another aspect, the invention provides a method for isolating asecond protein which binds a first protein. This method involvescontacting a cell or an in vitro sample including a first protein and asecond protein with a first ligand (e.g., a first ligand having a firstcrosslinker) and with a second ligand. The contacting is conducted underconditions that allow (i) the first protein to bind the first ligand,(ii) the second protein to bind the second ligand, and (iii) the firstligand or the first crosslinker to covalently bind the second ligand,thereby generating a product including the first ligand and the secondligand and generating a complex including the product, the firstprotein, and the second protein. The complex is isolated, and the firstprotein and/or the second protein in the complex or recovered from thecomplex is identified. In one embodiment, the first and/or secondprotein includes a detectable group. In another embodiment, the secondligand includes a crosslinker. In one embodiment, the generation of theproduct indicates that the first protein and the second protein interactin vivo or are part of the same biological pathway. In anotherembodiment, the product is used for drug discovery or development, leadoptimization, or development of an agricultural or environmental agent.

The invention also provides numerous methods for selecting a targetmolecule which binds a compound of interest. For example, the compoundmay be a molecule that appears to promote or inhibit a disease state.The selected target molecule may be used, for example, to study thedisease, to identify other molecules associated with the disease, and toidentify therapeutics with bind or modulate the activity of the targetmolecule or another member of the disease pathway.

In another aspect, the invention provides a method for selecting acandidate target molecule which binds a small molecule of interest. Themethod involves contacting an in vitro sample including a small moleculeof interest with a library of candidate target molecules underconditions that allow complex formation between the small molecule ofinterest and one or more of the candidate target molecules. The complexis isolated, and one or more of the candidate target molecules arerecovered from the complex, thereby selecting one or more candidatetarget molecules which bind the small molecule of interest. In variousembodiments, the library of candidate target molecules is recombinantlyproduced or is obtained from an extract from a cell, tissue, ororganism. The library of candidate target molecules can be unpurified,partially purified, or completely purified from other components priorto being contacted with the small molecule of interest. In variousembodiments, the target molecules are expressed on the surface of phageor are not expressed on the surface of phage. In one embodiment, priorto contacting the small molecule with the library of candidate targetmolecules, the small molecule of interest is selected from a library ofsmall molecules based on its effect in a biological assay. In oneembodiment, the method also includes identifying the selected targetprotein. In particular embodiments, the small molecule of interest has amoiety other than an amino acid or has a molecular weight less than5000, 4000, 3000, 2000, 1000, 750, 500, or 250 daltons.

In another aspect, the invention provides a method for selecting atarget protein which binds a small molecule of interest. This methodincludes expressing in a population of cells a protein fusion includinga target protein covalently linked to surface protein, the expressionbeing carried out under conditions that allow the display of the proteinfusion on the surface of the cells. The cells are contacted with a smallmolecule of interest, and the cells which bind the small molecule ofinterest are selected, thereby selecting the target proteins which bindthe small molecule of interest. Exemplary cells include mammalian,bacterial, yeast, and insect cells. In one embodiment, the method alsoincludes identifying the selected target protein. In particularembodiments, the small molecule of interest has a moiety other than anamino acid or has a molecular weight less than 5000, 4000, 3000, 2000,1000, 750, 500, or 250 daltons

In another aspect, the invention features another method for selecting atarget protein which binds a small molecule of interest. This methodinvolves expressing in a population of cells a protein fusion includinga target protein covalently linked to surface protein, the expressionbeing carried out under conditions that allow the display of the proteinfusion on the surface of viruses released from the cells infected withthe virus. The viruses are contacted with a small molecule of interest,and the viruses which bind the small molecule of interest are selected,thereby selecting the target proteins which bind the small molecule ofinterest. In one embodiment, the method also includes identifying theselected target protein. In various embodiments, the virus is abacteriophage or adenovirus. In particular embodiments, the smallmolecule of interest has a moiety other than an amino acid or has amolecular weight less than 5000, 4000, 3000, 2000, 1000, 750, 500, or250 daltons. In yet other embodiments, the small molecule of interestdoes not contain biotin or is not naturally produced by bacteria. Instill other embodiments, the small molecule of interest is a nucleicacid, lipid, or carbohydrate. In still other embodiments, the smallmolecule of interest is immobilized on a solid surface such as amagnetic or fluorescent bead. In other embodiments, an adenovirus isused to infect 293 cells or perc6 cells, or a bacteriophage is used toinfect bacteria.

In another aspect, the invention features a method for selecting atarget protein which binds a small molecule of interest. This methodinvolves expressing in a population of cells or an in vitro sample alibrary of target proteins in which each target protein is covalentlylinked to a nucleic acid encoding the target protein. The cells or invitro sample are contacted with a small molecule of interest, and thetarget proteins which bind the small molecule of interest are selected.In one embodiment, the method also includes identifying the selectedtarget protein. In particular embodiments, the small molecule ofinterest has a moiety other than an amino acid or has a molecular weightless than 5000, 4000, 3000, 2000, 1000, 750, 500, or 250 daltons

In various embodiments of any of the above methods for selecting atarget molecule or target molecule which binds a small molecule ofinterest, at least 2, 5, 10, 20, 50, 100, 1000, 10000, or more targetmolecules are contacted with the small molecule. In other embodiments, atarget peptide or protein is associated with a polynucleotide encodingthe target, using standard methods such as phage display, cell surfacedisplay, plasmid display, ribosome display, viral display). In otherembodiments, the small molecule is immobilized on a solid surface, suchas a column, bead, or magnetic bead. In other embodiments, the smallmolecule contains a fluorescent group, or the small molecule isindirectly or directly linked to a fluorescent group (e.g., linkedthrough the binding of a fluorescently labeled antibody), and thecomplex of the small molecule and a target molecule is isolated usingFACS sorting. In other embodiments, the small molecule of interest is anon-naturally occurring molecule or a naturally occurring molecule froman organism other than bacteria (e.g., such as a naturally occurringhuman molecule).

The invention also provides methods for identifying compounds that binda target molecule before the target molecule is experimentally validatedas a drug target. Additionally, methods are provided for identifyingligands for two or more target molecules. For example, binders can besimultaneously identified for multiple target molecules by performing anassay containing multiple target molecules or by performing multipleassays in parallel. These high throughput assays greatly increase thenumber of target molecules that can be analyzed.

Accordingly, in one aspect, the invention provides a method forselecting a candidate compound that binds or modulates the activity of atarget molecule prior to validation of the target molecule as a drugtarget. This method involves contacting a cell or an in vitro sampleincluding a target molecule that has not been previously validated as adrug target with a library of candidate compounds under conditions thatallow one or more of the candidate compounds to bind or modulate theactivity of the target molecule. A candidate compound which binds ormodulates the activity of the target molecule is selected. In oneembodiment, the selected candidate compound is identified. In otherembodiments, the method also includes measuring the effect of theselected candidate compound in a biological assay, thereby determiningthe biological function of the target molecule. In yet otherembodiments, the cell or in vitro sample includes at least 2, 5, 10, 20,30, 50, 100, or more target molecules, and for each of the targetmolecules, a candidate compound is selected that binds or modulates theactivity of the target molecule.

In another aspect, the invention features a method for selectingcandidate compounds that bind or modulate the activity of targetmolecules. This method involves contacting a cell or an in vitro sampleincluding a first target molecule and a second target molecule with alibrary of candidate compounds under conditions that allow one or moreof the candidate compound to bind or modulate the activity of the firsttarget molecule and allow one or more of the candidate compound to bindor modulate the activity of the second target molecule. A candidatecompound which binds or modulates the activity of the first targetmolecule is selected, and a candidate compound which binds or modulatesthe activity of the second target molecule is selected. In oneembodiment, one or more of the selected candidate compounds areidentified. In other embodiments, the method also includes measuring theeffect of one or more of the selected candidate compounds in abiological assay, thereby determining the biological function of thetarget molecule. In yet other embodiments, the cell or in vitro sampleincludes at least 5, 10, 20, 30, 50, 100, or more target molecules, andfor each of the target molecules, a candidate compound is selected thatbinds or modulates the activity of the target molecule.

The invention also features a variety of databases. These databases areuseful for storing the information obtained in any of the methods of theinvention. These databases may also be used in the development oftherapeutics and in the selection of a preferred therapeutic for aparticular patient or class of patients. Many other uses of thesedatabases are described herein.

In one such aspect, the invention features an electronic databaseincluding at least 10, 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, or 10⁹ recordsof target molecules correlated to records of ligands and their abilityto bind or modulate the activity of the target molecules. In a relatedaspect, the invention provides an electronic database including aplurality of records of target molecules that have not been previouslyvalidated as drug targets and/or target molecules of unknown biologicalfunction correlated to records of ligands and their ability to bind ormodulate the activity of the target molecules. In another relatedaspect, the invention features an electronic database including at least10, 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, or 10⁹ records of target moleculedomains correlated to records of ligands and their ability to bind thedomains. By “domain” is meant a domain found in one or more proteinsthat catalyze the same type of reaction or that bind the same type ofmolecules; or the domains are identified as different protein structuralmotifs or functional families based upon the analysis of DNA or aminoacid sequences, x ray crystal structures, or biological assays. Forexample, the database may contain records of ligands and their abilityto bind a kinase domain (i.e., able to bind one or more kinases) or aphosphatase domain (i.e., able to bind one or more phosphatases). Thisdatabase may be used, for example, for characterizing the binding sitesof proteins or other target molecules and for determining theselectivity of ligands for particular binding sites or particularfamilies of compounds.

In various embodiments of the above databases, the database includesrecords for at least 0.5, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or100% of the proteins or protein domains in the proteome of an organism,such as a bacteria, yeast, or mammal. In particular embodiments, thedatabase includes records for at least 0.5, 1, 5, 10, 20, 30, 40, 50,60, 70, 80, 90, or 100% of the proteins or protein domains in the humanproteome. In yet other embodiments, the database includes records for atleast one protein expressed by an open reading frame for at least 0.5,1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% of the open readingframes in the genome of an organism.

In another aspect, the invention features a computer including adatabase of the invention and a user interface (i) capable of displayingone or more ligands that bind or modulate the activity of a targetmolecule whose record is stored in the computer or (ii) capable one ormore target molecules that bind or have an activity that is modulated bya ligand whose record is stored in the computer. Exemplary databasesinclude at least 10 records of target molecules, such as targetmolecules that have not been previously validated or target molecules ofunknown biological function. In another aspect, the invention providesan electronic database including at least 10², 10³, 5×10³, 10⁴, 10⁵,10⁶, 10⁷, 10⁸, or 10⁹, records of compounds correlated to records of aphenotype in one or more biological assays that are effected by thecompounds. The biological assay involves a cell or in vitro sample thatdoes not contain an exogenous copy of a nucleic acid encoding a proteinthat binds the compound or does not contain an exogenous reporter gene.

In another aspect, the invention features computer including thedatabase of the above aspect and a user interface (i) capable ofdisplaying one or more phenotypes in one or more biological assays for acompound whose record is stored in the computer or (ii) capable ofdisplaying one or more compounds that effects a phenotype whose recordis stored in the computer.

In another aspect, the invention provides electronic database includingat least 10 records of target molecules correlated to records of anexpression profile or activity of the target molecules. In anotheraspect, the invention features an electronic database including aplurality of records of target molecules that have not been previouslyvalidated as drug targets and/or target molecules of unknown functioncorrelated to records of an expression profile or activity of the targetmolecules. In various embodiments of either database, the databaseincludes records for at least 0.5, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80,90, or 100% of the proteins in the proteome of an organism, or on atleast 10², 10³, 5×10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, or 10⁹ target molecules.In other embodiments, the database includes records for at least 0.5, 1,5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% of the proteins in theproteome of an organism (e.g., the human proteome). In yet otherembodiments, the database includes records for at least one proteinexpressed by an open reading frame for at least 0.5, 1, 5, 10, 20, 30,40, 50, 60, 70, 80, 90, or 100% of the open reading frames in the genomeof an organism.

In yet another aspect, the invention provides a computer including adatabase of the invention and a user interface (i) capable of displayingone or more expression profiles or activities of a target molecule whoserecord is stored in the computer or (ii) capable of displaying one ormore target molecules that have an expression profile or activity whoserecord is stored in the computer. In various embodiments, the databaseincludes at least 10 records of target molecules, such as targetmolecules that have not been previously validated as drug targets ortarget molecules of unknown function.

Any of the databases or computers can be used in any of the followingmethods. Exemplary uses of these databases include clustering ofchemical scaffolds and types of active sites/proteins, global indexingof binding properties such as binding uniqueness and overlap,determining the specificity of scaffold for a target, determining thepotential toxicity of a compound (e.g., identifying a compound specificonly for the target or a compound that doesn't bind to proteinsimportant in metabolism and toxicity such as P450 isomers—generallybinding predicts metabolism), selecting a compound to probe a particularbiology or pathology, identifying compounds which bind to probe thestructure of a target or generate a “chemical crystal structure” withclusters around functional domains on the protein (alone or inconjunction with other techniques, e.g., NMR, Xray crystallography, orcomputational chemistry approaches), identifying a protein domain bysearching across the database for shared domains within proteins towhich the compound binds, identifying substitutions on a chemicalscaffold which modulate binding to create a mini SAR, selecting a targetmolecule responsible for the action of a particular compound,discovering alternative targets/indications for a compound (e.g., a drugor experimental compound), discovering alternative compounds (e.g,preferably alternative chemical structures or scaffolds) which bind to atarget, selecting a therapy based on pharmacogenomics, and selectingscaffolds to serve as leads for optimization of a drug.

In one such aspect, the invention features a method of identifying atarget molecule associated with a phenotype of interest. This methodinvolves using an electronic database including a plurality of recordsof phenotypes in a biological assay correlated to records of the ligandsand their ability to cause or contribute to the phenotypes. A selectionof a phenotype of interest is received, and one or more ligands whichcontribute to the phenotype of interest are identified. An electronicdatabase including a plurality of records of ligands correlated torecords of the target molecules that bind the ligands or have anactivity that is modulated by the ligands is used to identify one ormore target molecules that bind or are modulated by the ligand(s) whichcontribute to the phenotype of interest, thereby identifying one or moretarget molecules associated with the phenotype of interest. In oneembodiment, the phenotype of interest is associated with a diseasestate, and the target molecule is determined to promote or inhibit thedisease state. In one embodiment, the method is computer implemented.

In yet another aspect, the invention features a method of identifying aphenotype that is associated with a target molecule of interest. Thismethod involves providing an electronic database including a pluralityof records of target molecules correlated to records of the ligands andtheir ability to bind or modulate the activity of the target molecules,and receiving a selection of a target molecule of interest. One or moreligands which bind or modulate the activity of the target molecule ofinterest are identified. An electronic database including a plurality ofrecords of ligands correlated to records of phenotypes in a biologicalassay caused by the ligands is provided and used to identify one or morephenotypes in a biological assay caused by the ligand(s), therebyidentifying one or more phenotypes associated with the target moleculeof interest. In one embodiment, the method is computer implemented.

In yet another aspect, the invention features a method of identifying aligand that binds or modulates the activity of a target molecule ofinterest. This method involves providing an electronic databaseincluding at least 10 records of target molecules correlated to recordsof the ligands and their ability to bind or modulate the activity of thetarget molecules, and receiving a selection of a target molecule ofinterest. One or more ligands which bind or modulate the activity of thetarget molecule of interest are identified. In various embodiments, themethod includes comparing the chemical structures of two or more ligandswhich bind or modulate the activity of the target molecule of interest,thereby identifying functional groups in the ligands which promote thebinding or modulation of the target molecule of interest. In otherembodiments, the method also includes comparing the chemical structuresof two or more ligands which bind or modulate the activity of the targetmolecule of interest, thereby determining the frequency of one or morefunctional groups or scaffolds in the collection of the ligands. Inother embodiments, one or more compounds that have one or morefunctional groups that are present in two or more of the ligands for usein drug discovery or development or lead optimization. In oneembodiment, the method is computer implemented.

In yet another aspect, the invention features a method of identifying atarget molecule that binds or has an activity that is modulated by aligand of interest. This method involves providing an electronicdatabase including at least 10 records of ligands correlated to recordsof the target molecules that bind or have an activity that is modulatedthe ligands, and receiving a selection of a ligand of interest. One ormore target molecules that bind or have an activity that is modulated bythe ligand of interest are identified. In various embodiments, themethod includes comparing the chemical structures of two or more targetmolecules which bind the ligand of interest, thereby identifyingfunctional groups or domains in the target molecules which promote orcontribute to the binding of the ligand of interest.

In yet another aspect, the invention features a method for determiningthe selectivity of a ligand of interest. This method involves providingan electronic database including at least 10 records of target moleculescorrelated to records of the ligands and their ability to bind ormodulate the activity of the target molecules, and receiving a selectionof a ligand of interest. The number of target molecules in the databasethat bind or are modulated by the ligand is determined, therebydetermining the selectivity of the ligand of interest. In variousembodiments, the ligand increases an activity of a target molecule,wherein the activity is associated with a disease state, an adverseside-effect, or toxicity and the ligand is eliminated from drugdiscovery or development, lead optimization, or development of anagricultural or environmental agent. In other embodiments, the liganddecreases an activity of a target molecule, wherein the activity isassociated with a disease state, an adverse side-effect, or toxicity andthe ligand is selected for discovery or development, lead optimization,or development of an agricultural or environmental agent. In oneembodiment, the method is computer implemented.

In yet another aspect, the invention provides a method for selecting atherapy for a subject for the treatment, stabilization, or prevention ofa disease or disorder. This method involves providing an electronicdatabase including at least 10 records of target molecules correlated torecords of the therapeutics and their ability to bind or modulate theactivity of the target molecules, and determining a target molecule inthe subject that has a mutation associated with the disease or disorder.A therapeutic is selected from the database that binds or modulates theactivity of the target molecule and thereby treats, stabilizes, orprevents the disease or disorder. In other embodiment, the subject or agroup of subjects having the mutation is selected for a clinical trialfor the therapy or is classified in a particular subgroup for theclinical trial. In particular embodiments, the target molecule is aprotein or nucleic acid. In one embodiment, the method is computerimplemented.

In yet another aspect, the invention features another method forselecting a therapy for a subject for the treatment, stabilization, orprevention of a disease or disorder. This method involves providing anelectronic database including at least 10 records of target moleculescorrelated to records of the therapeutics and their ability to bind ormodulate the activity of the target molecules, and determining a targetmolecule in the subject that has a mutation associated with the diseaseor disorder. A therapeutic is selected from the database that does notbind or modulate the activity of the target molecule. In one embodiment,the mutation decreases the affinity of the target molecule for one ormore therapeutics in the database and thus may decrease the efficacy ofthe therapeutic in that subject compared to subjects without themutation. According to this embodiment, a therapeutic that binds amolecule other than the target molecule is selected. In otherembodiment, the subject or a group of subjects having the mutation isexcluded from a clinical trial for a therapeutic having decreasedaffinity for the mutant form of the target molecule, or the subject or agroup of subjects is classified in a particular subgroup for theclinical trial. In yet other embodiment, the subject or a group ofsubjects having the mutation is selected for a clinical trial for atherapeutic that binds a molecule other than the target molecule, or thesubject or a group of subjects is classified in a particular subgroupfor the clinical trial. In particular embodiments, the target moleculeis a protein or nucleic acid. In one embodiment, the method is computerimplemented.

The invention also features improved methods for using mass spectrometryto determine whether a compound of interest is present in a sample.These methods may be used to identify ligands for particular targetmolecules.

In one such aspect, the invention provides a method of determiningwhether a compound of interest is present in a sample. This methodinvolves determining or providing (i) reference mass spectra for two ormore compounds from a library of compounds and (ii) a test mass spectrumof a sample including one or more compounds from the library. Whether ornot one or more of the peaks of a reference mass spectrum are includedin the test mass spectrum is determined, thereby determining whether thecompound that generated the reference mass spectrum is present in thesample. In various embodiments, the reference mass spectra aresequentially or simultaneously analyzed until all of the peaks in thetest mass spectrum have been assigned to a compound. In otherembodiments, the determination of whether or not the peaks of areference mass spectrum are included in the test mass spectrum includesa sequential determination of whether the peaks of one or more referencemass spectrum are included in the test mass spectrum. In yet otherembodiments, the determination of whether or not the peaks of areference mass spectrum are included in the test mass spectrum isrepeated until either (i) all of the peaks in the reference massspectrum are determined to be present in the test mass spectrum, therebydetermining that the compound that generated the reference mass spectrumis present in the sample, or (ii) a peak in the reference mass spectrumis determined to be absent in the test mass spectrum, therebydetermining that the compound that generated the reference mass spectrumis not present in the sample.

In yet another aspect, the invention provides another method ofdetermining whether a compound of interest is present in a sample. Thismethod involves determining or providing (i) reference mass spectra oftwo or more compounds from a library of compounds and (ii) a test massspectrum of a sample including one or more compounds from the library.One or more peaks of the test mass spectrum are analyzed to determinewhether they are included in a reference mass spectrum. For a referencemass spectrum containing a peak that is present in the test massspectrum, one or more of the other peaks in the reference mass spectrumare analyzed to determine whether they are present in the test massspectrum, thereby determining whether the compound that generated thereference mass spectrum is present in the sample. In particularembodiments, the determination of whether the peaks in a reference massspectrum are present in the test mass spectrum includes a sequential orsimultaneous determination of whether the peaks of one or more referencemass spectrum are included in the test mass spectrum. In otherembodiments, the determination of whether a peak in a reference massspectrum is present in the test mass spectrum is repeated until either(i) all of the peaks in the reference mass spectrum are determined to bepresent in the test mass spectrum, thereby determining that the compoundthat generated the reference mass spectrum is present in the sample, or(ii) a peak in the reference mass spectrum is determined to be absent inthe test mass spectrum, thereby determining that the compound thatgenerated the reference mass spectrum is not present in the sample.

In various embodiments of either of the above methods of determiningwhether a compound of interest is present in a sample, the mass spectrumof each compound in the library is determined. In yet other embodiments,at least one of the peaks in the reference spectrum is an isotope peak,a fragment peak, or a parent peak. In particular embodiments, the methodinvolves determine whether all of the peaks in a reference spectrum arepresent in the test mass spectrum. In other embodiments, the referencemass spectrum are contained in a database including records of one ormore properties of mass spectra correlated to records of compounds thatgenerate the mass spectra. In particular embodiments, the databasecontains data on one or more properties selected from the groupconsisting of the mass to charge ratio of an isotope peak, the mass tocharge ratio of a fragment peak, the mass to charge ratio of a parentpeak, the intensity of an isotope peak, the intensity of a fragmentpeak, and the intensity of a parent peak. In still other embodiments,one or more of the steps for determining whether a peak in a test massspectrum is present in a reference mass spectrum are computerimplemented.

In invention also provides a computer-readable memory having storedthereon a program for determining whether a compound of interest ispresent in a sample. This computer-readable memory includes computercode that receives as input mass spectrometry data including the mass tocharge ratio for one or more peaks in a reference mass spectra (i.e.,the mass spectrum of an individual compound from a library ofcompounds). This computer-readable memory also includes computer codethat receives as input mass spectrometry data including the mass tocharge ratio for one or more peaks in a test mass spectra (i.e., themass spectrum of a sample including one or more compounds from thelibrary). The computer-readable memory also has computer code thatdetermines whether the peaks of a reference mass spectrum are includedin the test mass spectrum, thereby determining whether the compound thatgenerated the reference mass spectrum is present in the sample.

In a related aspect, the invention features a computer-readable memoryhaving stored thereon a program for determining whether a compound ofinterest is present in a sample. The memory includes computer code thatreceives as input mass spectrometry data including the mass to chargeratio for one or more peaks in a reference mass spectra (i.e., the massspectrum of an individual compound from a library of compounds), andcomputer code that receives as input mass spectrometry data includingthe mass to charge ratio for one or more peaks in a test mass spectra(i.e., the mass spectrum of a sample including one or more compoundsfrom the library). The memory also includes computer code thatdetermines whether one or more peaks of the test mass spectrum areincluded in a reference mass spectrum, and computer code that determineswhether all of the peaks in a reference mass spectrum are present in thetest mass spectrum, thereby determining whether the compound thatgenerated the reference mass spectrum is present in the sample.

The invention also features methods for the automated production ofexpression vectors or the automated production and purification ofproteins.

In one such aspect, the invention features a method of producing two ormore vectors encoding proteins of interest. This method involvesrobotically contacting a first nucleic acid encoding a first protein ofinterest with a first backbone nucleic acid in a robotic device underconditions that allow the their reaction, thereby producing a firstvector encoding the first protein, and robotically contacting a secondnucleic acid encoding a second protein of interest with a second vectornucleic acid in the robotic device under conditions that allow theirreaction, thereby producing a second vector encoding the second protein.In some embodiments, the method also includes robotically contacting thefirst vector with a first cell under conditions that allow the insertionof the first vector into the first cell, and robotically contacting thesecond vector with a second cell under conditions that allow theinsertion of the second vector into the second cell. In variousembodiments, at least 3, 4, 5, 8, 10, 15, 30, 60, 90, or more vectorsare produced simultaneously. In other embodiments, the backbone nucleicacids are linearized expression vectors, and an insert encoding aprotein of interest is ligated to the expression vector under conditionsthat generate a circularized expression vector containing the insert. Inother embodiments, the first and second vectors or cells are containedin different flasks or wells in the robotic device. In otherembodiments, the first cell expresses the first protein, and the secondcell expresses the second protein. In yet other embodiments, the firstprotein and the second protein are purified as described in the aspectbelow. In other embodiments, the first cell and/or the second cell arebacteria such as E. coli, insect cells such as Drosophila cells, ormammalian cells such as Cos, HEK293, or CHO cells. In other embodiments,the first vector and the second vector are transferred from the firstcell and the second cell to cells of another cell type, such as insector mammalian cells, for the production of the first protein and thesecond protein. In other embodiments, a roller bottle system, Stir tanksystem, capillary cell culture system, or bioreactor is used to grow thecells. The first vector and/or the second vector can be used to produceprotein to be used in any of the methods of the invention (e.g., toidentify ligands that bind the protein).

One protein production and/or purification method of the inventioninvolves expressing a first protein in a first cell under conditionsthat result in the secretion of the first protein into a first medium ina robotic device and expressing a second protein in a second cell underconditions that result in the secretion of the second protein into asecond medium in the robotic device. The robotic device transfers thefirst medium to a first chromatography column and transfers the secondmedium to a second chromatography column. In one embodiment, the firstprotein and the second protein are isolated, thereby purifying the firstprotein and the second protein. In various embodiments, at least 3, 4,5, 8, 10, 15, 30, 60, 90, or more proteins are purified simultaneously.In other embodiments, the first and second cells are contained indifferent flasks or wells in the robotic device. In other embodiments,the first cell and/or the second cell are bacteria such as E. coli,insect cells such as Drosophila cells, or mammalian cells such as Cos,HEK293, or CHO cells. In other embodiments, the first cell and/or secondcell are transiently transfected Cos, HEK293, Drosophila cells or CHOcells or stably transfected Cos, HEK293, CHO, E coli, or Drosophilacells. In yet other embodiments, the first protein and/or the secondprotein are glycosylated in mammalian or insect cells. In variousembodiments, the first protein or the second protein naturally contain asecretion signal or are genetically modified to contain a secretionsignal so that they are secreted by the cells into the medium. The firstprotein and/or the second protein can be used in any of the methods ofthe invention (e.g., to identify ligands that bind the protein). Inother embodiments, the robotic device can be used to contact the firstprotein and/or the second protein with a library of candidate ligands toselect ligands that bind the protein(s) using any of the methodsdescribed herein. In yet other embodiments, the first protein and/or thesecond protein are used as members of a library of target molecules thatare robotically contacted with a small molecule of interest to selectthe target molecules that bind the small molecule of interest using anyof the methods described herein.

The invention also features linear DNA molecules that can be used in theautomated production and purification of proteins. In one such aspect,the invention features a linear DNA molecule that is less than 3, 500,3,000, 2,000, 1,000, 750, 500, or 300 nucleotides in length and includesa promoter operably linked to a secretory or leader sequence. PreferredDNA molecules are labeled with topoisomerase (i.e., covalently ornon-covalently bonded to topoisomerase). In a related aspect, theinvention features a linear DNA molecule that is less than 3,000, 2,000,1,000, 750, 500, or 300 nucleotides in length, includes a promoter, andis labeled with topoisomerase. In another aspect, the invention providesa linear DNA molecule that is less than 3, 500, 3,000, 2,000, 1000, 750,500, or 300 nucleotides in length and includes a nucleic acid segmentencoding an affinity tag (e.g., a histidine tag with, for example, 6,10, or 12 histidines, a FLAG tag, a myc tag, or a GST tag) and a nucleicacid segment encoding a polyA region. Preferred DNA molecules arelabeled with topoisomerase. Preferably, the DNA molecule of any of theabove aspects is between 500 and 300 nucleotides in length.

The DNA molecules of the above aspects can be used to construct linearDNA molecules encoding proteins of interest. In one such aspect, theinvention features a linear DNA molecule including a first promoteroperably linked to (i) a nucleic acid segment encoding a first proteinof interest and an affinity tag (e.g., a histidine tag with, forexample, 6, 10, or 12 histidines, a FLAG tag, a myc tag, or a GST tag),and (ii) a first polyA region. Preferably, the nucleic acid segmentencoding the first protein is operably linked to a secretory or leadersequence. In some embodiments, the DNA molecule is less than 3,000,2,000, or 1,000 nucleotides in length. Preferably, the DNA molecule islabeled with topoisomerase. In some embodiments, the DNA molecule alsoincludes a nucleic acid segment encoding a second protein of interestoperably linked to the first promoter. In certain embodiments, DNAmolecule also includes a second promoter operably linked to (i) anucleic acid segment encoding a second protein of interest and (ii) asecond polyA region. The second protein of interest may or may not havean affinity tag (e.g., a histidine tag with for example, 6, 10, or 12histidines, a FLAG tag, a myc tag, or a GST tag). In other embodiments,the DNA molecule encodes 3, 4, 5, 6, or more different proteins.

In another aspect, the invention features a method of producing a linearDNA molecule encoding a protein of interest. This method involvesrobotically contacting (i) a linear, topoisomerase labeled DNA moleculethat has a promoter, (iii) a linear DNA molecule encoding a firstprotein of interest, and (iii) a linear, topoisomerase labeled DNAmolecule that has a nucleic acid segment encoding an affinity tag (e.g.,a histidine tag, a FLAG tag, a myc tag, or a GST tag) and a nucleic acidsegment encoding a polyA region in a first compartment in a roboticdevice under conditions that permit their reaction, thereby producing afirst linear DNA molecule encoding the first protein. In preferredembodiments, the method also includes robotically contacting (i) alinear, topoisomerase labeled DNA molecule that has a promoter, (iii) alinear DNA molecule encoding a second protein of interest, and (iii) alinear, topoisomerase labeled DNA molecule that has a nucleic acidsegment encoding an affinity tag (e.g., a histidine tag, a FLAG tag, amyc tag, or a GST tag) and a nucleic acid segment encoding a polyAregion in a second compartment in the robotic device under conditionsthat permit their reaction, thereby producing a second linear DNAmolecule encoding the second protein. Preferably, the method alsoinvolves robotically contacting the first linear DNA molecule with afirst cell under conditions that allow the insertion of the first linearDNA molecule into the first cell, and robotically contacting the secondlinear DNA molecule with a second cell under conditions that allow theinsertion of the second linear DNA molecule into the second cell. Insome embodiments, the first and/or second linear DNA molecule iscircularized (e.g., ligated using standard methods) prior to insertionin to a cell. Preferably, the first cell expresses the first protein,and the second cell expresses the second protein. In other preferredembodiments, at least 3, 4, 5, 8, 10, 15, 30, 60, 90, or more linear DNAmolecules are produced simultaneously. Preferably, each topoisomeraselabeled DNA molecule is less than 3,000, 2,000, 1,000, 750, 500, or 300nucleotides in length.

In another aspect, the invention features a method of purifying aprotein. This method involves expressing a first protein in a first cellincluding a linear DNA molecule of the invention (or a circularizedversion of a linear molecule of the invention) under conditions thatresult in the secretion of the first protein into a first medium in arobotic device, robotically transferring the first medium to a firstchromatography column, and purifying the first protein. In preferredembodiments, the method also includes expressing a second protein in asecond cell including a linear DNA molecule of the invention (or acircularized version of a linear molecule of the invention) underconditions that result in the secretion of the second protein into asecond medium in the robotic device, robotically transferring the secondmedium to a second chromatography column, and purifying the secondprotein. In other preferred embodiments, at least 3, 4, 5, 8, 10, 15,30, 60, 90, or more proteins are purified simultaneously.

In another aspect, the invention features a cell or cell linetransfected (e.g., stably or transiently transfected) with a nucleicacid of the invention. In various embodiments the cell is a bacteriasuch as E. coli, an insect cell such as a Drosophila cell, or amammalian cell such as a Cos, HEK293, or CHO cell.

In another aspect, the invention features a CHO cell that is transientlytransfected with a nucleic acid encoding an mRNA or protein of interest.In some embodiments, the transfected nucleic acid is a linear DNAmolecule, such as a linear DNA molecule of the invention. Preferably,the cell is transiently or stably transfected with a nucleic acidencoding SV40 T antigen.

In various embodiments of any of the aspects of the invention, theligand binds a target molecule covalently or non-covalently. In otherembodiments, the ligand directly binds the target molecule or bindsanother molecule in the same pathway as the target molecule and therebyactivates or inhibits the target molecule. In other embodiments, theligand has a molecular weight of less than 5000, 4000, 3000, 2000, 1000,750, 500, or 250 daltons. In other embodiments, the ligand has less than5, 4, 3, or 2 hydrogen-bond donors or less than 10, 8, 6, 4, or 3hydrogen-bond acceptors. In yet other embodiments, the ligand has a clogP of less than 4.15. In still other embodiments, the ligand is notFK506. In other embodiments, the selected candidate ligands bind thetarget molecule with a K_(d) of less than 1 fM, between 1 fM and 1 nM,between 1 nM and 1 μM, or less than 1 μM. In other embodiments, theselected candidate ligands are subjected to analysis by IR, MS, NMR, UV,amino acid sequencing, nucleic acid sequencing, or a combinationthereof. In other embodiments, an isotope or fragment peak is used toidentify a candidate ligand that has the same mass as another candidateligand in the library.

In various other embodiments of any of the aspects of the invention,candidate ligands and/or the target molecules are in solution phase. Inother embodiments, the ligand or the target molecule is immobilized on asolid surface such as a bead or chip. In other embodiments, the assaymedium is fractionated by chromatography. In particular embodiments, thecomplex is isolated using size exclusion (e.g., using silca or polymerresin), multimodal, bimodal, or biphasic chromatography (e.g.,chromatography based on more than a single characteristic such as sizeexclusion and reverse phase, size exclusion and anionic exchange, sizeexclusion and cation exchange, or chromatography using an internalsurface reverse phase (ISRP), GFF, or GFFII resin). Exemplary resinsinclude diol, sepharose, superose, and polymethyl methacrylate. Otherdesirable resins are stable above 5, 50, 500, 5000, or 7000 psi. Inparticular embodiments, columns containing resins with differentseparation characteristics are combined in series. In other embodiments,column chromatography is used to isolate the complex, and the complexelutes from the column in less than 60, 30, 20, 15, 10, 5, 3, 2, or 1minute; the void volume is less than 20, 15, 10, 5, 4, 3, 2, or 1 mL; orthe column diameter is less than 5, 4, 3, 2, or 1 mm. In otherembodiments, HPLC, spin columns, capillary chromatography, or filtrationare used to isolate the complex. In other embodiments, a decrease in theUV absorbance of an HPLC or other chromatography peak corresponding tounbound ligand is used to detect a decrease in the amount of unboundligand (and thus an increase in the amount of bound ligand). In stillother embodiments, the complex of a target molecule and bound candidateligands is subjected to a chromatography step that separates the boundligands from the target molecule. In yet other embodiments of any of theaspects of the invention, an immobilized target is contacted withcandidate ligand(s), and the support is washed with medium lackingcandidate ligands and treated in manner that releases any bound ligandsfrom the target. In still other embodiments, following exposure of thetarget to the candidate ligand(s), the support is washed with mediumlacking target molecules, and treated in a manner that dislodges thecandidate ligand molecules and any bound target molecules from thesupport. In other aspects, one, multiple, or all the steps in the methodare robotically automated or computer implemented.

In still other embodiments of any of the aspects of the invention, thefunction or activity of a selected target is characterized by a chemicalassay, biochemical assay, enzymatic assay, biological assay, or acombination thereof. In particular embodiments, the target function ischaracterized by an apoptosis assay, proliferation assay, necrosisassay, angiogenesis assay, invasion assay, or a combination thereof. Inother embodiments, the candidate target molecules are isolated frombiochemical extracts, cells, tissues, organisms, or recombinant sources.In yet other embodiments, a selected target molecule is identified usingNMR, IR, UV, MS (e.g., MALDITOF, MALDI, single quad, triple quad, orelectrospray MS or MS-MS), amino acid sequencing, or nucleic acidsequencing. In other embodiments, the candidate target molecule is afull-length protein or a fragment from a protein that is less thanfull-length. Exemplary targets include enzymes and receptors such asGPCRs, kinases, ion channels, nuclear receptors, proteases,phosphatases, and methylases. Targets may include molecules or classesof molecules for which therapeutically active compounds have or have notbeen previously developed.

In various embodiments, a method or databases of the invention is usedto determine specificity of a scaffold for a target, determine potentialtoxicity, identify a compound to probe a particular biology orpathology, identify a compound to probe a target, perform mini SAR,select a target responsible for action of a particular compound,“greening” of portfolio and patent life extension for products (e.g.,identifying other uses for patented compounds, identifying other targetmolecules that patented compounds bind, or identifying other compoundsthat bind useful targets), select a compound based on pharmacogenetics,or select scaffolds to serve as leads for optimization of a drug.

It is noted that all of the embodiments of the various aspects of theinvention for candidate ligands apply to small molecules of interest.

Herein, by “target molecule that has not been previously validated as adrug target” is meant a target molecule whose modulation has not beenpreviously experimentally determined to promote or inhibit a diseasestate in an animal model of the disease, as described in a publicationor public presentation. For example, unvalidated target moleculesinclude molecules for which the activation or inhibition of themolecules or the decrease or increase in the expression level of themolecules has not been experimentally shown to modulate a disease statein an animal model of the disease. In contrast, validated drug targetsinclude molecules for which increasing or decreasing the amount or anactivity of the molecules has been experimentally determined to promoteor inhibit a disease state in an animal model. Examples of validatedtargets include targets whose overexpression or inactivation due to aknockout mutation or other gene silencing methods (e.g., antisenseinhibition of gene expression) has been experimentally demonstrated topromote or inhibit a disease state in an animal model.

By “target molecule of unknown biological function” is meant a targetmolecule for which an activity has not been previously experimentallydemonstrated, as described in a publication or public presentation. Invarious embodiments, the target molecule of unknown function is anucleic acid or protein having less than 60, 50, 40, 30, 20, or 10%sequence identity to nucleic acids or proteins for which an activity hasbeen experimentally demonstrated. In other embodiments, the nucleic acidor protein has not previously been assigned a putative function.Sequence identity is typically measured using sequence analysis softwarewith the default parameters specified therein (e.g., Sequence AnalysisSoftware Package of the Genetics Computer Group, University of WisconsinBiotechnology Center, 1710 University Avenue, Madison, Wis. 53705). Thissoftware program matches similar sequences by assigning degrees ofhomology to various substitutions, deletions, and other modifications.

By “target molecule of unknown secondary or tertiary structure” is meanta target molecule for which the secondary or tertiary structure has notbeen previously experimentally determined, as described in a publicationor public presentation. In some embodiments, the secondary or tertiarystructure has not previously been predicted or modeled based on theknown structure of a homologous molecule. In other embodiments, thelocation or tertiary structure of a binding site or active site in thetarget molecule has not been previously experimentally determined.

By “scaffold” is meant a core chemical structure that is contained intwo or more different molecules in a library of candidate compounds. Invarious embodiments, at least 5, 10, 10², 10³, 10⁴, 10⁵, 10⁶, or moremolecules in the library contain the scaffold. In some embodiments, thelibrary contains at least 2, 2, 5, 10, 10², 10³, 10⁴, 10⁵, or moredifferent scaffolds.

By “library” is meant a collection of 2, 5, 10, 10², 10³, 10⁴, 10⁵, 10⁶,10⁷, 10⁸, 10⁹, or more different molecules. In various embodiments, eachmembers of a library has a different mass. In other embodiments, atleast 2, 5, 10 15, 20, 30, 40, 50, or more of the members have the samemass or a mass than differs by less than 1, 0.5, 0.1, 0.05, or 0.01daltons from the mass of another library member.

By “crosslinker” is meant a molecule or moiety that contains one or morefunctional groups capable of reacting with another molecule.

By “proteome” is meant all the proteins expressed by an organism. Theproteome includes all of the alternative splice variants of a proteinthat are expressed by the organism.

By “purified” is meant separated from other components that naturallyaccompany it. Typically, a compound is substantially pure when it is atleast 50%, by weight, free from proteins, antibodies, andnaturally-occurring organic molecules with which it is naturallyassociated. In other embodiments, the compound is at least 75%, 90%, or99%, by weight, pure. A substantially pure compound may be obtained bychemical synthesis, separation of the compound from natural sources, orproduction of the compound in a recombinant host cell that does notnaturally produce the compound. Proteins and organic compounds may bepurified by one skilled in the art using standard techniques such asthose described by Ausubel et al. (Current Protocols in MolecularBiology, John Wiley & Sons, New York, 2000). The degree of purificationcompared to the starting material can be measured using standard methodssuch as polyacrylamide gel electrophoresis, column chromatography,optical density, HPLC analysis, or western analysis (Ausubel et al.,supra). Exemplary methods of purification include immunoprecipitation,column chromatography such as immunoaffinity chromatography, magneticbead immunoaffinity purification, and panning with a plate-boundantibody.

The methods of the present invention have numerous advantages. Forexample, the methods allow the expression and purification of everyprotein in the proteome of an organism (e.g., the human proteome) andthe identification of high-affinity, drug-like scaffolds for eachprotein. The methods also allow a theoretically unlimited number ofcandidate compounds and candidate scaffolds to be screened. Because themethods of the invention are so rapid and can be performed on such alarge scale, they are useful for assaying target molecules that have notbeen previously validated as drug targets or target molecules of unknownbiological function to select ligands that bind and/or modulate theactivity of the target molecules. In contrast, current methods forselecting ligands that bind a target molecule have been limited totarget molecules that have been validated as drug targets. Thus, thepresent methods greatly expand the number of target molecules that canbe assayed. Target molecules for which high affinity binders areselected can then be validated as drug targets.

Additionally, the methods of the invention allow candidate ligands thathave the same mass to be distinguished. For example, mass spectralisotope and fragment peaks typically differ between ligands of the samemass. Thus, these peaks can be used to identify a candidate ligand evenif it has the same parent peak as another candidate ligand in a libraryof compounds. This advantage allows the use of libraries containingmultiple compounds of the same or similar masses.

The solution phase embodiments of the invention allow fluid phasebinding to occur as it would in a serum or cell. In contrast to manycurrent assays which measure a specific activity of the target protein,the methods of the present invention may be readily applied to anytarget in the proteome without customization. The methods also use avery small amount of reagents (such as <300 ug of each target for200,000 compounds, and <35 ng of each compound for each target). Themethods also allow a library of compounds to be screened without taggingor purifying individual members of the library before screening, therebygreatly decreasing the amount of time necessary to screen the library.The length of time required to screen libraries can also be reduced byusing the automated embodiments of the present invention which allowmultiple libraries and/or multiple targets to be analyzed in parallel.

Other advantages and embodiments of the invention will be apparent fromthe following detailed description and from the claims.

4. DESCRIPTION OF THE FIGURES

FIG. 1 is an overview of the “genotype to phenotype” approach.

FIG. 2 is an overview of the “phenotype to genotype” approach.

FIG. 3 is a set of spectra illustrating the ability of P38 MAP kinase toisolate and extract a specific ligand with micromolar affinity.

FIG. 4 is a set of UV spectra illustrating a P38 MAP kinaseconcentration dependant reduction of the 86002 peak but negligiblereduction of the quinine peak in the HPLC separation of protein-boundcompounds from free compounds.

FIG. 5 is a set of mass spectra illustrating that the compound extractedfrom the mixture and released from p38 MAP kinase was identified as86002.

FIG. 6 is a list of the compounds in the 10 compound mixture and theirmolecular weights.

FIG. 7 is a set of spectra demonstrating a P38 concentration dependentreduction of the 86002 peak but negligible reduction of the Colchicinepeak or peaks representing the other compounds in the mixture during theHPLC separation of protein-bound compounds from free compounds. When theprotein fraction was collected and the mass spectrum was determined, thespectrum included the peaks characteristic of 86002 at a level farhigher than other peaks.

FIG. 8 is a set of spectra illustrating a tubulin concentrationdependent reduction of the Colchicine peak but negligible reduction ofthe 86002 peak or peaks representing the other compounds in the mixtureduring the HPLC separation of protein-bound compounds from freecompounds. When the protein fraction was collected and the mass spectrumdetermined, the spectrum included the peaks characteristic of colchicineat a level far higher than other peaks.

FIG. 9 is a list of the compounds in the 100 compound mixture and theirmolecular weights.

FIG. 10 is a set of spectra illustrating that P38 MAP kinase binds andextracts a ligand with micromolar affinity (86002) from a 100 compoundmixture in a specific and concentration dependent manner.

FIG. 11 is a set of spectra illustrating that tubulin binds and extractsa hit (Colchicine) from a 100 compound mixture in a specific andconcentration dependent manner.

FIG. 12 is a set of UV spectra illustrating that excellent separation ofthe protein target from the unbound compounds in the 100 compoundmixture is also achieved at higher flow rates.

FIG. 13 is a set of spectra illustrating the ability of spin columns toseparate a compound bound to a protein target from unbound compounds.This method was used to identify Colchicine as the predominant compoundfrom the 100 compound mixture that bound tubulin.

FIG. 14 is a schematic illustration of the steps in one embodiment ofthe Chemical Array Assay.

FIG. 15 is a schematic illustration of an exemplary computer.

FIG. 16 is an exemplary flow chart for one embodiment of the inventionfor identifying a compound in a sample.

FIG. 17 is a graph illustrating the pairing of chemical scaffolds withprotein targets which can be used to produce a chemical fingerprint ofthe human proteome. The binding assays and the databases of theinvention have many applications. For example, they may be used todetermine specificity of a scaffold for a target, determine potentialtoxicity, identify a compound to probe a particular biology orpathology, identify a compound to probe a target, perform mini SAR,select a target responsible for action of a particular compound,“greening” of portfolio and patent life extension for products (e.g.,identifying other uses for patented compounds, identifying other targetmolecules that patented compounds bind, or identifying other compoundsthat bind useful targets), select a compound based on pharmacogenetics,or select scaffolds to serve as leads for optimization of a drug.Knowledge in a database of chemical interactions with targets at theproteomic scale allows selection of better leads and validation ofgenomics based targets.

FIG. 18 is a schematic illustration of one embodiment for the automationand high throughput of methods of the invention to produce ligand/targetpairs.

FIG. 19 s a schematic illustration of one embodiment for the highthroughput production of ˜2 milligrams of each of the ˜90,000 proteinsin the human proteome using automated cloning and production systemsover a period of ˜3 years at a rate of ˜600 proteins per week.

FIG. 20 is a schematic illustration of steps involved in thehigh-throughput methods of the present invention for the generation ofexpression vectors.

FIG. 21 is a table of exemplary proteins that have been produced withproper translational modifications using standard methods in Drosophilacells.

FIG. 22 is a schematic illustration of steps involved in thehigh-throughput methods (e.g., ˜two hour) of the present invention forthe generation of linear expression vectors.

FIG. 23 is a schematic illustration of steps involved in thehigh-throughput methods of the present invention for identifying highaffinity ligands for a target molecule of interest. A library ofcompounds and the target molecule are applied to one of the bindingassays described herein and the highest affinity compounds are selected.It is not necessary to initially bias the screening library because alarge number of scaffolds can be screened to find high affinitycompounds, many of which may not have been predicted to bind with suchhigh affinity. Compounds with even higher affinity for the target can begenerated by reacting the selected compounds with each other in thepresence of the target molecule. Compounds that react to each otherwhile bound to the target molecule may form products which increasedaffinity for the target molecule because of the larger number offunctional groups in the product that interact with the target molecule.The products with highest affinity for the target molecule can beselected using one of the binding assays described herein. Thus, thereis no need for prior chemical purification of the products usingtradition methods prior to this binding assay.

FIG. 24 is a schematic illustration of the binding of building blockswith reactive groups (e.g., small molecules identified from a bindingassay or small molecules from a library of compounds) to a targetprotein. The building blocks are reacted on the surface of the proteinto generate a product with higher affinity for the protein.

FIG. 25 is a schematic illustration of the binding of building blockswithout reactive groups (e.g., small molecules identified from a bindingassay or small molecules from a library of compounds) to a targetprotein. The building blocks are reacted on the surface of the proteinto generate a product with higher affinity for the protein.

FIG. 26 is a schematic illustration of parallel processing by injectingand assaying multiple samples at once.

FIG. 27 is a list of exemplary sequences for use in the linear DNAconstructs of the invention, including SEQ ID NO: 1-9.

5. DETAILED DESCRIPTION OF THE INVENTION

5.1. Genotype to Phenotype

In one aspect, the present invention relates to methods of exposingprotein or nucleic acid targets to a plurality of potential ligands,collecting ligand-target pairs, and using the ligand(s) which bind thetarget to analyze the target's biological function. One embodiment isoutlined in FIG. 1. The method is used to determine the function of atarget, which may be a target which has hitherto been unknown. Manyother methods for selecting a candidate ligand that binds a targetmolecule are described herein. All of the embodiments listed below insections 5.1.1 to 5.1.5 can be used in any of the methods of theinvention.

5.1.1. Targets

According to the present invention, a target molecule is the compoundfor which a binding or reacting molecule is sought. In preferredembodiments, the target is the species present at the highestconcentration in the reaction vessel. In various preferred embodiments,the target is present at the same concentration as the ligand in thereaction vessel. In yet other preferred embodiments, the target ispresent at a higher or a lower concentration than the concentration ofeach ligand or the total concentration of the mixture of candidateligands. In other preferred embodiments, the target is the speciespresent at the lowest concentration in the reaction vessel. In oneembodiment of the invention, the target is the species in the reactionvessel which has the highest molecular mass. A target may be a naturallyoccurring biomolecule synthesized in vivo or in vitro. A target may becomprised of amino acids, nucleic acids, sugars, lipids, naturalproducts or combinations thereof. An advantage of the instant inventionis that no prior knowledge of the identity or function of the target isnecessary.

In a preferred embodiment of the invention, the target is comprised ofamino acids, peptides, enzymes, proteins (e.g., membrane or solubleproteins), antibodies or combinations thereof. In a first step,polynucleotides encoding the proteins of interest may be selected andintroduced into an expression system. The polynucleotides may beselected by differential screening, subtractive hybridization,differential display, microarray expression analysis, representationaldifference analysis (RDA) or laser capture microdissection. The proteinmay be synthesized in vivo as in a bacterial plasmid, phage, transientcellular expression system or viral expression system. Alternatively,selected proteins may be synthesized in vitro by in vitro transcriptionand translation (e.g., Promega web site) or by common FMOC oligopeptidesynthesis chemistry. The expressed protein may be optionally purifiedand then exposed to a ligand library.

According to the invention, genes can be expressed from a complete cDNAor gene library of human or other species or a subset of genes selectedfor differential expression in a particular disease or upon a particularstimulus. Genes that are differentially expressed in diseased orstimulated cells and tissues can be selected using but not limited totechniques such as subtractive hybridization, informatics, microarrays,SAGE, or laser capture microdissection. If partial sequences such asESTs are recovered, full-length tissue specific cDNAs may then be clonedfrom full-length human cDNA libraries some of which are available fromCLONTECH, STRATAGENE, Life Technologies, and NCBI. Between 20% and 60%of the genes being cloned in this way, depending upon the tissue, havenot previously been identified and the functions of virtually every genecloned have not been elucidated. In a preferred embodiment, these geneshave been discovered by genomics. To produce proteins, the full-lengthcDNAs may be tagged with hexahistdine (6his) inserted at the carboxylterminal end and glutathione synthetase (GST) at the amino terminal endof the gene each with a protease cleavage site. Alternatively, theintein-based self cleaving tag by New England Biolabs may be used toavoid the need for protease treatment. These genes may be expressed andsecreted into the supernatant by baculovirus, for example, using theInvitrogen-Schneider 2 Drosophila system with its his tag and bipprotein leader, transfection using CaPO₄, and selection by hygromicininduced expression with copper sulfate, which can produce 5-10 mg/L ofprotein in the supernatant which can be purified over a nickel column.Non-limiting examples of alternative expression systems include Fast Bacor another baculoviral system or mammalian expression systems (CHO, COS,293, etc.). E. coli may also be used for protein production but does notglycosylate proteins and the baculovirus system is as reliable and doesglycosylate proteins. The resulting proteins can then be purified byNi(2+)-NTA chromatography as a first purification step and glutathioneaffinity chromatography as a second step followed by specific proteaseremoval by cleavage of the tags. If the intein based affinity system isused, no protease is required. The proteins can be expressed andpurified using alternative techniques as well or the complete or partialprotein may be expressed in phage or bound to a surface.

In another embodiment of the invention targets are comprised of RNA orDNA as oligonucleotides or polynucleotides. In one non-limitingembodiment of the present invention, nucleic acids to be introduced intoan expression system are identified by large scale sequencing of EST's.Oligonucleotide targets may be synthesized directly. Polynucleotidetargets may be synthesized directly or prepared by amplification of atemplate polynucleotide, e.g., by PCR. The oligonucleotide orpolynucleotide target may be optionally purified and then exposed to aligand library.

In another embodiment of the invention, targets are comprised of simpleor complex carbohydrates. In another embodiment of the invention,targets are comprised of lipids. In another embodiment of the invention,the target comprises natural products.

In another embodiment of the invention, the target may be derivatized.Non-limiting examples include biotin, fluorescein, digoxygenin, greenfluorescent protein, radioisotope, his tag, magnetic bead, glutathione Stransferase, photoactivatible crosslinker or combinations thereof.

Target preparations may contain minor quantities of other compounds as aresult of partial or incomplete purification of the desired component.

5.1.2. Ligands

According to the present invention, a ligand is any molecule which hasthe potential to bind to a target and/or exert an effect in a bioassay.In various embodiments of the genotype to phenotype approach, the ligandor the mixture of candidate ligands is present in the reaction vessel ata lower concentration than the target. In other embodiments of thephenotype to genotype approach, the ligand or the mixture of candidateligands is present in the reaction vessel at the same concentration asthe target. In still other embodiments of the genotype to phenotypeapproach, the ligand or the mixture of candidate ligands is present inthe reaction vessel at a higher concentration than the target. A ligandmay be comprised of amino acids, nucleic acids, sugars, lipids, naturalproducts, natural product-like compounds or combinations thereof. Aligand may be created by any combinatorial chemical method.Alternatively, a ligand may be a naturally occurring biomoleculesynthesized in vivo or in vitro. The ligand may be optionallyderivatized with another compound. One advantage of this modification isthat the derivatizing compound may be used to facilitate ligand-targetcomplex collection or ligand collection, e.g., after separation ofligand and target. Non-limiting examples of derivatizing groups includebiotin, fluorescein, digoxygenin, green fluorescent protein, isotopes,polyhistidine, magnetic beads, glutathione S transferase,photoactivatible crosslinkers or combinations thereof.

Ligands should have low affinity for each other at the conditions underwhich the target is exposed to the ligand library.

Ligand libraries are mixtures of ligands which differ from each other inmass, composition, structure or combinations thereof. The presentinvention contemplates such libraries which comprise at least 10different ligands or at least 100 different ligands or at least 1000different ligands.

The ligand library used to bind to the proteins can be derived from manysources. The invention includes the use of chemicals, proteins,peptides, antibodies, sugars, lipids, natural products, naturalproduct-like compounds or any combination thereof. These may be preparedby organic synthesis, combinatorial chemistry, recombinant DNA,biochemical extraction, purification, etc. In a preferred embodiment ofthe invention, natural product-like synthetic libraries are generatedusing diversity oriented chemistry (e.g., asymmetric split poolsynthesis on beads or in solution, synthesized in parallel or inseries), either combinatorial or medicinal chemistry. The subunits usedin the synthesis are preferably drug-like and are as highly diversifiedas possible. The units may be structurally rigid or flexible. The unitsmay undergo chemical reactions that modify their own structures (e.g.,rearrangement). The units may have functional groups added.

Drug-like compounds may be made using different scaffolds with differentchemistries (e.g., organic, inorganic, peptide, protein, alkaloid,carbohydrate, lipids, natural product-like compounds). Drug-likecompounds may incorporate spectral identifiers. Non-limiting examples ofspectral identifiers include elements which resolve into characteristicisotope fragmentation patterns in mass spectroscopy (e.g., Cl, Br, N,H). Drug-like compounds may also be made with compounds with uniquefragmentation patterns upon mass spectroscopy analysis (penicillin). Thelibraries can also be designed to facilitate other analytical anddeconvolution techniques (e.g., IR FTIR).

In another embodiment of the invention, non-limiting examples of otherlibraries which may be used include commercially available libraries(e.g., Pharmacopeia, ArQule, and Chembridge), focused chemicallibraries, peptides, peptides or proteins including the TAT, VP22 orANTENNAPEDIA transduction signals, structurally flexible smallmolecules, natural products, sugars, and monoclonal antibodies. Thesubunits used in the synthesis are preferably drug like and are ashighly diversified as possible.

Libraries of the invention may be tagged to facilitate liganddeconvolution and resynthesis after binding has been observed.Alternatively, the ligands can be deconvoluted without tagging. Theligands can be tested individually or in a mixture. Diverse librariessynthesized as a mixture in solution phase or on solid phase supportscan be used. In one embodiment, the transduction peptides or variantsthereof from TAT, VP22 or ANTENNAPEDIA can be crosslinked to a smallmolecule to enhance its ability to cross a membrane or barrier.Alternatively, a small molecule homologue of these peptides can bedeveloped and linked to the same.

5.1.3. Binding

According to the present invention, a ligand-target pair describes anaffinity relationship between a ligand and target wherein thedissociation constant (K_(d)) is less than about 20 μM, and preferablyless than about 1 μM. The invention further contemplates ligand-targetinteractions where K_(d)≦100 nM or K_(d)≦100 μM or K_(d)≦100 fM. Theinteraction between the ligand and target may be covalent ornon-covalent. The ligand of a ligand-target pair may or may not displayaffinity for other targets. The target of a ligand-target pair may ormay not display affinity for other ligands.

According to the invention a reaction vessel is any container or surfacein or upon which a target may be exposed to at least one of ligand. In apreferred embodiment of the invention, reaction vessels are arranged tofacilitate high throughput screening. This may be accomplished by using96 or 384 well microtitre plates. Another possibility is depositingdifferent target proteins on a glass slide at high density asillustrated by MacBeath et al., 2000, Science 289:1760. In otherembodiments of the invention the reaction vessel may be a column, resin,membrane, matrix, bead or chip.

The conditions under which the target is exposed to the ligand librarymay vary. Non-limiting examples include binding reactions where thetemperature is less than about 5° C. or from about 5° C. to about 25° C.or from about 25° C. to about 40° C. or over about 40° C. Furthernon-limiting examples include binding reaction conditions where the pHis less than about 5 or from about 5 to about 9 or over about 9. Furthernon-limiting examples include binding reactions in solutions which arecomprised of water, an alcohol, an organic solvent or combinationsthereof. Further non-limiting examples include binding reactionconditions where the additives may include ions, salts, detergents,reductants, oxidants or combinations thereof. A further non-limitingexample includes binding reaction conditions where the target isimmobilized. A further non-limiting example includes binding reactionconditions where ligands are immobilized. A further non-limiting exampleincludes binding reaction conditions where targets are immobilized. Afurther non-limiting example includes binding reaction conditions wherethe target and the ligands are in solution.

A further non-limiting example includes binding reaction conditionswhere the ligand comprises a marker such as biotin, fluorescein,digoxygenin, green fluorescent protein, radioisotope, his tag, amagnetic bead, an enzyme or combinations thereof.

In one embodiment of the invention, the targets may be screened in amechanism based assay. The mechanism based assay includes but is notlimited to an assay to detect ligands which bind to the target. This mayinclude a solid phase or fluid phase binding event with either theligand, the protein or an indicator of either being detected.Alternatively, the gene encoding the protein with previously undefinedfunction can be transfected with a reporter system (including but notlimited to β-galactosidase, luciferase, green fluorescent protein, etc.)into a cell and screened against the library ideally by a highthroughput or ultra high throughput (e.g., 1560 well per plate of chip)screening or with individual members of the library. In an alternativeembodiment of the invention other mechanism based binding assays may beused. These include other assays including biochemical assays measuringan effect on enzymatic activity, cell based assays in which the targetand a reporter system (e.g., luciferase or β-galactosidase) have beenintroduced into a cell, and binding assays which detect changes in freeenergy. Binding assays can be performed with the target fixed to a well,bead or chip or captured by an immobilized antibody or resolved bycapillary electrophoresis. The bound ligands may be detected usuallyusing colorimetric or fluorescence or surface plasmon resonance. In thecolumn based binding assay, the binding may be performed in a well orother vessel, on a gel, etc.

While there are a number of ways these assays can be done, followinginductive thought, only the chemicals which bind to the protein targetare relevant and can teach its function. In addition, the fluid phasemore accurately reflects the true biological conformation. Furthermore,in the reaction both the protein and the chemicals preferably are nottagged, decreasing the problem that the protein has been constrained insome way by coupling to a plate of a bead or the ligand is not in thesame fluid phase confirmation which it will be in the cell or the blood.Consequently, in a preferred embodiment of the invention, 1 to 20,000ligands (with 1000 to 10,000 preferred) may be mixed together with 1 ngto 1 mg of each protein (with 0.1 to 100 μg preferred) in a small volume(1 fL to 1 mL with preferred range of 0.1 μL to 100 μL) to have a 0.1 μMto 100 μM concentration with a preferred range of 0.1 μM to 10 μM. Inparticular embodiments of the invention, by looking at only the 1 to 500ligands which would be expected to bind to each protein with micromolarto nanomolar affinity, one avoids having to screen millions ofcombinations individually. This overcomes the need to tag the library inany other way than the molecules own mass, isotope pattern orfragmentation pattern, because mass spectroscopy can resolve andidentify the possible 1 to 5 hits per well. Alternatively, IR and/orFTIR can be used alone or in combination with mass spectroscopy toresolve and identify hits.

5.1.4. Ligand-Target Separation and Ligand Identification

In a preferred embodiment of the invention, ligand-target pairs areseparated from unbound ligands and unbound targets by liquidchromatography, ligand-target pairs are separated from each other in asecond liquid chromatography step, and ligands which bind are identifiedby mass spectroscopy. In various embodiments of the invention, thesolution phase binding may occur in a well, tube or column. Capillaryelectrophoresis, and/or other detection methods may be used todeconvolute ligands from the library. Particularly, HPLC and massspectroscopy or capillary electrophoresis and mass spectroscopy canmeasure the molecules with extreme sensitivity. In addition, thistechnique can be done in extremely small volumes which is critical tooptimally utilize the small amounts of each member of the chemicallibrary. For example, less than 20,000 ligands from the chemical librarymay be pooled with the protein for binding again in each well in 96 wellplates at ≦10 μM in approximately 100 μL and 1 μg of protein. In apreferred embodiment, HPLC is performed in 96 well plates withcartridges to serve as the columns for each well. In another embodiment,the separation is performed in parallel in 384 well, 1536 well, or10,000 or greater well formats using column, wells, cartridges, chips,or filters. Alternatively, this may be performed in a standard HPLCcolumn, spin column, or other column. The first cartridge/column may bea gel permeation or size exclusion or gel filtration (e.g., G25 likeresin, Pharmacia) to hold the unbound molecules in the resin but allowthe bound ligand and protein to pass through. A small sample volume isdesired (preferably 1 to 100 μL or less) yet this procedure may dilutethe sample by one or more orders of magnitude. It is helpful, therefore,to use a small and narrow column (preferably having a diameter of 1 to 2mm or less and a length of 5 to 200 mm (Rocket Column, Biorad orPharmacia columns) to minimize dilution of the sample. Capillary LiquidChromatography can also be used. This resin separates the protein alongwith small molecules bound to it with high affinity (K_(d)≦1.0 μM). Thenext cartridge/column would use a hydrophobic or hydrophilic reversephase HPLC resin, the choice of which depends upon the hydrophobicity ofthe ligand library being used: C18 (silica hydrophobic-used with lesshydrophobic ligand) C8 column (more hydrophilic, used for morehydrophobic ligands), a cyanocolumn (use for more hydrophilic ligands)or SB8U from Agilent which can be used for either hydrophilic orhydrophobic ligands. These reverse phase HPLC methods separate the boundsmall molecule ligands from the protein and concentrate the smallmolecules and protein sample via resin binding. Subsequently, the smallmolecules may be eluted from the protein and the resin and the eluantsmay be collected in a 96 well plate. Providing one knows the amount ofthe starting material, affinity may also be measured in this step.Alternatively, competition studies can be done at a later time toquantitate binding affinity. These eluants may then be transferred to amass spectrometer and characterized. This may be done robotically inreal time potentially even in the 96 well format perhaps using either aparallel multiple channel microchip system or a parallel sprayinterface. Alternatively, chip based MALDI TOF Mass spectrometry may beused. In this case, the protein fraction from the column (spin, HPLC,capillary, other) can be spotted onto a chip or a filter in a 96 well orgreater format. The Omniflex or Autoflex MALDI instruments from BrukerDaltonics automatically desorb and analyze each of the samples from 100sample and 1536 sample formats, respectively.

Nonlimiting forms of mass spectrometry that may be used includeelectrospray, ion trap, Fourier Transform, MALDI, single or triplequadrapole in single MS, MS-MS, or MS-MS-MS formats.

Eluents may be characterized using a software package for use with themass spectrometer supplemented with information about the ligand libraryused. Mass spectroscopy may be used to identify compounds by directdetection of its mass. However, mass spectroscopy may also be used todetect compounds, scaffolds or linkers containing elements which resolveinto characteristic isotope patterns (e.g., ³⁵Cl, ¹³N, ²H) or compoundshaving unique fragmentation patterns (e.g., penicillin). For example,chlorine-containing compounds will be comprised of ³⁵Cl and ³⁷Cl whichwill produce two mass peaks, 2 AMU apart with a 3:1 intensity ratio.Similarly, bromine-containing compounds will be comprised of ⁷⁹Br and⁸¹Br which will produce two mass peaks, 2 AMU apart with a 1:1 intensityratio. This approaches may be used as an alternative to or incombination with true molecular weight to identify a compound.

Mass spectroscopy enables the mass, isotope, and fragmentation patternto be determined so accurately that, coupled with software, the exactmember of the library may be identified except for the isomer. Followingthis the theoretically expected 500 or so micromolar to nanomolar hitscan be pulled from the original library and synthesized in a largerscale. If the molecule is a peptide, it can be fused to the TATtransducing sequence which allows proteins to cross the cell membrane.

In another embodiment of the invention, ligands are characterized by IRor FTIR in addition to or instead of mass spectroscopy analysis. Thesetechniques permit identification of ligand functional groups orsubstitutions (e.g., hydroxyl or amino groups). Used in combination withmass spectroscopy, this may facilitate differentiation between ligandsof identical molecular weight.

According to the invention, the dissociation constant (K_(d)) of theligand-target pair should be less than about 100 μM and preferably lessthan about 10 μM. While not dispositive, the dissociation constant(K_(d)) of the ligand-target pair is one factor which may guide thoseskilled in the art in determining the utility of a ligand in determiningtarget function and as a drug lead. Thus, the invention contemplates butdoes not necessarily prefer ligand-target pair interactions where thedissociation constant (K_(d)) is less than about 1 μM or less than about100 nM or less than about 10 nM or less than about 1 nM or less thanabout 100 pM or less than about 10 pM.

If no hits or a low number of hits with reasonable affinity are found, astructural or chemical gap in the structural diversity of the chemicallibrary may have been identified. In such a case, target directedsynthesis can be employed to fill in that gap. If low affinity bindersare found, the binding can be repeated with a library containingphotoactivatable (or other) linkers on one of the functional domains.After the first column when only the protein and molecules binding to itare present, the photoactivation step can be performed, after which thesmall molecules can be eluted by reverse phase HPLC. In this way, thetarget has been used as a template and because two molecules which boundwith a low affinity linked together will have an increased affinity forthe target. In a preferred embodiment, the increase in affinity is 2 to100 fold.

5.1.4.1. Exemplary Chemical Array Assay Experimental Methods and Results

Methods for HPLC Based Assay

Drug-like chemical compounds representing a collection of drug-likechemical scaffolds (Sigma-Aldrich, ICN, Calbiochem) were weighed andmixed to a final concentration of 20 uM each in 50 mM ammonium acetatepH 7, 10% methanol. 1 uM to 20 uM tubulin or P38 MAP kinase (Sigma) weredispensed into HPLC low volume sample cuvettes (Waters) and mixed with0.5 uM to 20 uM compounds. After mixing and a 15 minute 37° C.incubation, the cuvettes were placed on ice and injected into the HPLC(Waters 2690) using an autoinjector (Waters) onto a 150 mm×2.1 mm IDPinkerton GFF II column (Regis Technologies) for dual size exclusion andphase separation with a 50 mM ammonium acetate, 10% methanol runningbuffer. The protein target and bound compounds eluted in the column voidvolume as detected using a Diode array detector and most of thecompounds absorbed well at a 243 nm frequency. In some cases, using lowconcentrations of each compound (0.5 to 5 mM) and fewer than 10compounds which could be easily separated from one another, it waspossible to titrate in the two protein targets and observe acorresponding titration in the level of UV absorbance of the specificcompound known to bind one of the protein targets but not to nonspecificcontrol compounds.

We optimized the column dimensions and the choice of resin to maximizethe separation of the compounds bound to the protein targets from theunbound compounds. Resins which elute protein in the void volume andsmall column diameters and lengths which minimize the void volume wereused. Such columns minimize the amount of dilution of the protein sampleand minimize the time required for each assay, thereby minimizing theamount of bound compound that dissociates from the protein (as governedby the K_(off) rate). These features enabled the use of minimal amountsof reagents, as well as sensitive detection methods. The column lengthswere such that the protein eluted in less than 2 to 3 minutes. A numberof HPLC columns, including the Regis 150 mm×2.1 mm GFF II column, a 1.0mm×100 mm YMC Diol column, a 2.1 mm×150 mm PhenomonexPolyhydroxymethacrylate (Polysep) column, and a Jordi 2.1×150 mm DivinylBenzene column, were tested. Similarly, other running buffers weretested in which the salt and methanol concentration were varied, and theratio of protein target to small compounds in the binding reaction wasvaried from 1000:1 to 1:1000. Resins representative of different classeswere tested for their ability to separate the protein fraction from thedrug-like small molecule compounds, and to minimize the cycle time forall of the compounds to elute from the column. These characteristics ofthe columns are determined by surface properties and limitations on flowrates due to resins collapsing under backpressure. Being silica basedand thus resistant to pressure, the YMC diol column had a cycle time ofunder 10 minutes but was only able to separate approximately 50% of thecompounds in the 100 compound mixture listed in FIG. 9 from the protein.The Phenomonex Polyhydroxymethacrylate column was able to separateapproximately 80% of the compounds in the 100 compound mixture from theprotein, and required a methanol gradient to achieve elution of many ofthe small molecule compounds; it tolerated a relatively low flow rate(0.18 ml/min) because of the inability to tolerate backpressures over600 PSI. The cycle time for the Phenomonex column was 1.5 hours with thegradient, and 35 minute for a subset of compounds (15% of the total)which could be isolated without the gradient. Other polymer basedcolumns [e.g., polyhydroxymethacrylate (Phenomonex, Shodex, Waters),polymethylmethacrylate (Shodex, TosohBiosep),Sepharose/Sephadex/Superose (Amersham Pharmacia Biotech)] also onlytolerated relatively low flow rates. The Jordi DVB columns are divinylbenzene polymer columns, which were operated at high pressure (4000 PSI)and undesirably bound the protein as well as the compounds, thus givingno separation in the buffer system used. Other buffer systems areexpected to allow separation of the protein from the unbound compounds.Different columns and resins were also combined in series, increasingthe percentage of compounds separated from the protein but alsoincreasing the cycle time. In applications where a longer cycle time(e.g., over 10 minutes per run) is acceptable, any of the above columnsor a series of the above columns may be used.

For shorter cycle times, other columns may be used. For example, theRegis GFF II column separated the protein fraction from 97% of thecompounds tested. Its pressure rating of 8000 PSI was above that of theHPLC (Waters 2690) used in these assays, which was operated at apressure of 6000 PSI. The cycle time of this resin was demonstrated tobe easily less than 8 minutes and could be further decreased by using afaster flow rate in an HPLC that tolerates pressures up to 8000 PSI. TheGFF II resin and GFF resin are internal surface reversed phase resinswhich were developed by Thomas Pinkerton for the direct analysis ofdrugs and drug metabolites in serum without interference by proteinadsorption. The resins consist of a porous silica support with ahydrophilic external surface and hydrophobic internal pores accessibleonly to molecules with a molecular weight less than 12,000 daltons.These surfaces are produced by bonding the tripeptideglycine-phenylalanine-phenylalanine (GFF) orglycidoxylpropyline-phenylalanine-phenylalanine (GFF II) to the silicasurfaces. The GFF or GFF II boned beads are then treated with theexopeptidase, carboxypeptidase A, which has a molecular weight (35,000daltons) large enough to exclude it from the pores resulting in thecleavage of the phenylalanine-phenylalanine portion from the outersurface. This treatment allows the glycine or glycidoxylpropyl to beexposed intact on the outer surface making the outer surface hydrophilicbut leaving the original tripeptide intact on the inner surface, therebymaking the inner surface hydrophobic (as described, for example, by themanufacture's packaging insert). The catalogue number of the column withthe GFF II resin that was used is 288-4. Other columns with othercatalogue numbers that are packed with these resins are also availablefrom Regis technologies and can also be used. The outer surface thusprevents large molecules from entering the inner layer through sizeexclusion and hydrophilic interactions. Small molecules enter the innersurface which is comprised of the hydrophobic support which retains andseparates the compounds based upon hydrophobic interactions. Given theshort cycle times and the degree of separation that can be achieved withthe GFF II resin, the GFF II column was used for subsequence assays;however, other resins can also be used.

Protein fractions from the HPLC columns were dissociated with 1% TFA,and a 100 uL sample was injected onto a reverse phase column (WatersSymmetry Shield) to separate the compounds that had been bound to theprotein. The compounds were eluted using an acetonitrile gradient past aUV detector and into a TOF mass spectrometer (Micromass LCT). Thebackground signal was subtracted from each sample using controlscontaining the protein in the absence of compounds, and the massspectrum was determined at cone voltages high enough to achievefragmentation of the compounds (20 to 80 volts). In other massspectrometry instruments, fragmentation can be achieved in a collisioncell. The fragmentation pattern which is characteristic for eachcompound consists of the larger parent peak and other peaks representingfragments of the chemical compound or their isotopes. The fragmentationpattern of the compound(s) released from the protein target was comparedto the characteristic fragmentation pattern observed for a compoundstandard to identify the compound(s) that bound the protein target.Alternatively, one or more characteristic isotope(s) of the parent peakrepresenting the molecular weight of the compound was compared with thestandard to identify the compound that bound the protein target. Inanother alternative analysis, the parent peak representing the molecularweight of the compound was itself compared with the standard to identifythe compound. Sometimes, the combination of these methods was also usedto identify the compound. Similar methods were applied under MSconditions which did not induce fragmentation of the compound, resultingin a mass spectrum containing peaks representing the molecular weight ofthe compound (e.g., the parent peak) and its isotopes.

Results from HPLC Based Method

SKB86002 is a ligand with micromolar affinity for the P38 MAP kinaseprotein target. P38 MAP kinase (5 uM) was mixed with 5 uM 86002 andseparated by HPLC on the Diol column (FIG. 3). The protein fraction wascollected and analyzed by mass spectrometry. The parent peak, fragments,and isotope peaks in the spectrum corresponded to the 86002 standardindicating that the P38 MAP kinase isolates and extracts a specificligand with micromolar affinity.

SKB86002 and quinine monohydrochloride (a nonspecific control compound)were mixed together to a final concentration of 5 uM each (FIG. 4).Increasing amounts of P38 MAP kinase protein (final concentrations 0,2.5, 5 and 10 uM) were mixed with the compound mixture at a finalconcentration of 5 uM each, and the protein was separated by HPLC on theDiol column. The UV spectrum demonstrated a P38 concentration dependantreduction of the 86002 peak but negligible reduction of the quininepeak.

When the P38 protein fraction was collected at the mid-point in thetitration (5 uM P38 MAP kinase+5 uM mixture of Quinine and 86002)illustrated in FIG. 4, the compound extracted from the mixture andreleased from the protein was identified as 86002, and not quinine,based on the parent peak, fragments, and isotope peaks in the massspectrum of the released compound (FIG. 5).

A mixture of equal amounts of 10 drug-like compounds including 86002 andcolchicine was prepared (FIG. 6). Increasing amounts of P38 MAP kinaseprotein (final concentrations 0, 3.5, and 5 uM) were mixed with the 10compound mixture at a final concentration of 0.5 uM of each compound,and the protein was separated by HPLC on the GFF II column (FIG. 7). TheUV spectrum demonstrated a P38 concentration dependent reduction of the86002 peak but negligible reduction of the Colchicine peak or peaksrepresenting the other compounds in the mixture. When the proteinfraction was collected and the mass spectrum was determined, thespectrum included the parent and isotope peaks characteristic of 86002at a level far higher than other peaks.

Increasing amounts of tubulin protein (final concentrations 0, 5, and 20uM) were mixed with the 10 compound mixture at a final concentration of0.5 uM of each compound, and the protein was separated by HPLC on theGFF II column (FIG. 8). The UV spectrum demonstrated a tubulinconcentration dependent reduction of the Colchicine peak but negligiblereduction of the 86002 peak or peaks representing the other compounds inthe mixture. When the protein fraction was collected and the massspectrum determined, the spectrum included the peaks characteristic ofColchicine at a level far higher than other peaks.

A mixture of equal amounts of 100 drug like compounds including 86002and Colchicine was prepared (FIG. 9). P38 (2 uM) was mixed with the 100compound mixture at a final concentration of 20 uM of each compound, andthe protein was separated from the unbound compounds using the GFF IIHPLC column (FIG. 10). The protein fraction was collected, the compoundwere released from the protein and mass spectrum was determined. Thespectrum contained a peak characteristic of 86002 at a level far higherthan other peaks. Thus, P38 MAP kinase binds and extracts a ligand withmicromolar affinity (86002) from a 100 compound mixture in a specificand concentration dependent manner. The mass spectrum background appearsto be comparable to that generated using only 10 compounds (FIG. 7),indicating that the assay should be scaleable to larger numbers ofcompounds (e.g., 1000's to 10,000's of compounds). For example, thesemethods may be used to analyze a library of over 10, 20, 40, 50, 75,100, 200, 500, 1000, 2000, 5000, 10000, or more compounds or morechemical scaffolds.

Tubulin (5 uM) was mixed with the 100 compound mixture at a finalconcentration of 5 uM of each compound, and the protein was separatedfrom the unbound compounds using the GFF II HPLC column (FIG. 11). Theprotein fraction was collected, the compound were released from theprotein, and the mass spectrum was determined. The spectrum showed thepeaks characteristic of colchicine at a level far higher than otherpeaks. Thus, tubulin binds and extracts a hit (Colchicine) from a 100compound mixture in a specific and concentration dependent manner. Themass spectrum background appears to be comparable to that generatedusing the 10 compound mixture (FIG. 8), indicating that the assay shouldbe scaleable to larger numbers of compounds (e.g., 1000's to 10,000's ofcompounds). For example, these methods may be used to analyze a libraryof over 10, 20, 40, 50, 75, 100, 200, 500, 1000, 2000, 5000, 10000, ormore compounds or more chemical scaffolds.

One way to increase the speed of the assay is to increase the flow rate(FIG. 12). The limiting factor affecting the maximum flow rate a columncan withstand is generally the backpressure which the resin can toleratebefore it collapses. One of the reasons the GFF II resin was selected isits ability to sustain pressures up to 8000 PSI compared with most sizeexclusion gels (e.g., Sepharose, Superose, Superdex,polymethylmethacrylate, polyhydroxymethacrylate, etc.) which havemaximum back pressures of 100-1500 PSI. At high flow rates, the GFF IIcolumn still achieved excellent separation of the protein from the 100compound mix, and all molecules have eluted from the column inapproximately 6 to 7 minutes. Thus, one way to scale up the assayaccording to the invention is to perform HPLC using column switchingdevices including, but not limited to, the six column selection valveson the Waters 2790 HPLC with injection of a new sample into a newlyswitched column every minute. Custom column switchers can be made fortwo or more columns, up to approximately 10 columns (FIG. 26).

Spin-Column Chromatography Methods

Drug-like chemical compounds representing a collection of drug-likechemical scaffolds (Sigma-Aldrich, ICN, Calbiochem) were weighed andmixed to a final concentration of 20 uM each in 50 mM ammonium acetatepH 7, 10% methanol. 5 uM to 20 uM bovine serum albumin (BSA) or tubulin(Sigma) were dispensed into HPLC low volume sample cuvettes (Waters) andmixed with 5 uM to 20 uM compounds. After mixing and a 15 minute 37° C.incubation, the cuvettes were placed on ice. 50 uL of the 100 compoundmixture listed in FIG. 9 was then layered on top of a MicroSpin G-25(Amersham Pharmacia Biotech) spin column which had been previouslyequilibrated with two washes of binding buffer (i.e., each wash involvedadding 200 uL of 50 mM ammonium acetate, 10% methanol buffer, andspinning the buffer through the column into a 1.5 mL microfuge tube(Eppindorf) at maximun setting in a microfuge (Eppindorf) for 30 secondsto a minute). Such spin columns are generally used to desalt andexchange buffer for DNA probes after labeling, though G-25 is one of theclassic size exclusion resins with a 25 KD molecular weight cut off. Thespin column was then placed in a 1.5 mL microfuge tube (Eppindorf) andspun for 30 seconds at maximum setting in the microfuge (Eppindorf).Alternatively, a vacuum can be used to pull solution through the spincolumn which is particularly useful when spin column/cartridges arearrayed in the 96 well format and a vacuum manifold is used to pull thesolution through the column into a 96 well plate.

In the case of BSA, the 50 uL solution in the bottom of the microfugetube was loaded onto the HPLC, the UV spectrum was visualized andcompared with an equivalent amount of the BSA/100 compound mixturebefore separation. In the case of tubulin, 25 uL of the solution at thebottom of the microfuge tube was dissociated with 1% TFA and injectedonto a reverse phase column (Waters Symmetry Shield), and the compoundswere eluted using an acetonitrile gradient past a UV detector into a TOFMS (Micromass LCT). Background was electronically subtracted from eachsample using controls containing the protein in the absence of compoundsand the mass spectrum was determined at cone voltages high enough toachieve fragmentation of the compounds (20 to 80 volts). In other massspectrometers, such fragmentation can be achieved in a collision cell.The fragmentation pattern which is characteristic for each compoundconsists of the larger parent peak and other peaks representingfragments of the chemical compound or their isotopes. The fragmentationpattern of the compound(s) released from the protein target was comparedto the characteristic fragmentation pattern observed for a compoundstandard to identify the compound(s) that bound the protein target.Alternatively, a characteristic isotope of the parent peak representingthe molecular weight of the compound was compared with the standard toidentify the compound that bound the protein target. In anotheralternative analysis, the parent peak representing the molecular weightof the compound was itself compared with the standard to identify thecompound. Sometimes, the combination of these methods was also used toidentify the compound. Similar methods were applied under MS conditionswhich did not induce fragmentation of the compound, resulting in a massspectrum containing peaks representing the molecular weight of thecompound (e.g., the parent peak) and its isotopes.

Results from Spin-Column Chromatography Based Methods

5 uM Bovine serum albumin (BSA, Sigma) was mixed with the 100 compoundmixture at a final concentration of 5 uM of each compound (FIG. 13).Half (50 uL) of the mixture was layered on top of a Micro-Spin G-25column and centrifuged. The protein containing fraction was collected atthe bottom of the microfuge tube. When the initial protein/compoundmixture was compared with the protein/compound mixture after separationusing the spin column separation method, a significant purification ofthe protein was observed based on UV absorbance. When the same protocolwas applied to a mixture of 20 uM tubulin and 20 uM of the 100 compoundmixture and the mass spectrum was determined for the elutedprotein-containing fraction, the spectrum showed the peakscharacteristic of Colchicine at a level far higher than other peaks.Although the background peak was slightly higher than that observedusing the HPLC column separation (FIG. 14), the speed and scalability ofthis spin column separation make it highly attractive. For example,these methods may be used to analyze a library of over 10, 20, 40, 50,75, 100, 200, 500, 1000, 2000, 5000, 10000, or more compounds or morechemical scaffolds.

5.1.4.2. Exemplary Methods for the Use of Pattern Recognition Softwareto Identify Isolated Ligand(s)

The present invention provides methods for using pattern recognitionanalysis of a mass spectrum to identify a compound from a mixture thathas been isolated using a protein target and any of the separationtechniques described herein.

In these methods, mass spectrometry fragmentation patterns aredetermined for many or all of the compound present in the initialmixture of candidate compounds. Alternatively, isotope or other massspectrometry patterns are determined for these compounds (e.g., M+1 orM+2 isotope peaks). The mass spectrometer sorts the compounds, theirisotopes, and/or their fragments on the basis of their mass to chargeratio, denoted m/z. The mass spectrometry conditions can be adjusted sothat most or all of the peaks represent molecules having a charge of +1(or −1), so that the value of some of the peaks is equal to the mass ofthe parent compound, an isotope, or a fragment of the parent compound(i.e., m/z=m/1=m). In some cases, other mass spectrometry conditions canbe used so that some or all of the peaks represent molecules having acharge of +2 or greater (or −2 or lower), so that the value of some ofthe peaks is less than the mass of the parent compound, an isotope, or afragment because the mass to charge ratio is less than the mass of themolecule (e.g., m/z=m/2). Thus, the mass spectrometry patterns consistof mass spectral peaks corresponding to masses (or mass to charge ratiosif the charge on the molecules is greater than one) of the parentcompounds, their fragments, and/or their isotopes.

The mass (or mass to charge ratio) of each of these peaks is enteredinto the database of an information retrieval system. The mass spectrumof a compound of interest that was released from a protein target isgenerated, and then pattern recognition software is used to compare thispattern with those contained in the database. A match positivelyidentifies the compound of interest. In one embodiment, peakscorresponding to two, three, or more of the most characteristic masses(compound 1: peaks A, B, and C; compound 2: peaks D, and E; etc.) areentered into the database for each of the compounds in the initialmixture. Software (e.g., MassLynx, version 3.5 from Micromass) is usedto search the mass spectrum of the compound(s) released from a proteintarget for peak A followed sequentially by a search for peaks B, C, D,E, etc. The presence of a particular peak is entered into a seconddatabase to indicate that the peak is present in the mass spectrum. Inanother possible method, the searches for particular peaks in the massspectrum are performed in any order. Iterative search commands may alsobe used to analyze the mass spectrum. For example, if peak Acorresponding to a particular compound is present in the mass spectrum,then the mass spectrum can be analyzed to determine whether another peak(e.g., peak B) characteristic of the same compound is also present inthe mass spectrum. Alternatively, if a peak characteristic of aparticular compound is not present in the mass spectrum, then the massspectrum can be analyzed to determine whether a peak (e.g., peak D)characteristic of another compound is present in the mass spectrum. Inyet another alternative method, multiple peaks are searched together byoverlaying a macro program over MassLynx. The peaks identified aspresent are compared with those in the first database from the compoundsin the initial mixture to identify the compound(s) released from theprotein target. FIG. 16 A contains an exemplary flow chart illustratingthe steps for some embodiments of these methods.

In another embodiment, two, three, or more masses (or mass to chargeratios) corresponding to the most characteristic peaks of the massspectrometry pattern are entered into the database for each compound inthe initial mixture. In an exemplary method, this database uses aMicrosoft Excel or Oracle program. Once the mass spectrum for the samplereleased from the protein target is determined and the two or three mainpeaks in the mass spectrum (e.g., the two or three peaks with thehighest signal) are located, a search is performed on the database forthe initial compound mixture using the masses (or mass to charge ratios)corresponding to those peaks. For example, the values of the masses canbe used in the “Find” command of these programs to search for candidatecompounds that produce peaks of that mass. The combination of massesidentified in the search thus identifies the compound(s) present in thesample.

In a yet another embodiment, the intensity of the signal at a particularmass (or mass to charge ratios) is used to positively identify acompound. This technique is particularly applicable if the pattern beingused is an isotope pattern. In this case, a database of compounds in themixture is generated that contains both the mass as well as theintensity of each of the two or three most characteristic peaks. Thisinformation is then collected for the sample of interest. The searchfunction of the database program is used to search for the correlatedmass and intensity parameters. A match positively identifies a compoundpresent in the sample.

In various embodiments for any of the methods of the present inventionfor the identification of one or more compounds of interest (e.g.,compounds released from a target), one or more mass spectral peakscorresponding to one or more fragments of a compound and/or one or moremass spectral peaks corresponding to one or more isotopes of a compoundis used to identify the compound. In other embodiments, the parent peakis used in the identification of the compound. In various embodiments,the parent peak is the only spectral peak used in the identification ofa compound. In yet other embodiments, the parent peak is used inconjunction with one or more peaks corresponding to a fragment or anisotope in the identification of a compound. In still other embodiments,a parent peak is not used in the identification of the compound. Inother embodiments, the compound is a component recovered from a mixtureof at least 5, 10, 20, 40, 50, 75, 100, 200, 500, 1000, 2000, 5000,10000 or more compounds that were contacted with a target of interest.In other embodiments, the compound is a component recovered from amixture of compounds that includes at least 5, 10, 20, 40, 50, 75, 100,200, 500, 1000, 2000, 5000, 10000 or more different chemical scaffolds.In particular embodiments, a parent peak is used in the identificationof a compound from a mixture of compounds that includes at least 5, 10,20, 40, 50, 75, 100, 200, 500, 1000, 2000, 5000, 10000 or more differentchemical scaffolds.

Any of the methods described herein may be implemented using virtuallyany computer. FIG. 15 shows such an exemplary computer system. Computersystem 2 includes internal and external components. The internalcomponents include a processor 4 coupled to a memory 6. The externalcomponents include a mass-storage device 8, e.g., a hard disk drive,user input devices 10, e.g., a keyboard and a mouse, a display 12, e.g.,a monitor, and usually, a network link 14 capable of connecting thecomputer system to other computers to allow sharing of data andprocessing tasks. Programs are loaded into the memory 6 of this system 2during operation. These programs include an operating system 16, e.g.,Microsoft Windows, which manages the computer system, software 18 thatencodes common languages and functions to assist programs that implementthe methods of this invention, and software 20 that encodes the methodsof the invention in a procedural language or symbolic package. Languagesthat can be used to program the methods include, without limitation,Visual C/C⁺⁺ from Microsoft. In preferred applications, the methods ofthe invention are programmed in mathematical software packages thatallow symbolic entry of equations and high-level specification ofprocessing, including algorithms used in the execution of the programs,thereby freeing a user of the need to program procedurally individualequations or algorithms. An exemplary mathematical software packageuseful for this purpose is Matlab from Mathworks (Natick, Mass.). Usingthe Matlab software, one can also apply the Parallel Virtual Machine(PVM) module and Message Passing Interface (MPI), which supportsprocessing on multiple processors. This implementation of PVM and MPIwith the methods herein is accomplished using methods known in the art.Alternatively, the software or a portion thereof is encoded in dedicatedcircuitry by methods known in the art.

5.1.5. Analysis of Target Function

To systematically classify target function, the hits for each target maybe screened in cell and tissue based assays representing each of themajor molecular mechanisms in disease pathogenesis. Where the target isoriginally selected based on differential expression analysis, assayswhich are particularly relevant to that differential expression arepreferred (e.g., a proliferation assay would be particularly relevantwhere the target arose from differential expression analysis ofcarcinoma cells). This panel of assays includes but is not limited toassays to detect and or measure: apoptosis, proliferation,ischemia/necrosis, inflammation, fibrosis, angiogenesis, metabolicsignaling, infection and development/differentiation. By focusing onpathogenic pathways and studying disease specific and cell specifictargets, novel targets for a number of therapeutic areas may beidentified. The goal of this panel is to screen for smallmolecule/protein members of the molecular pathways leading tosignificant diseases including but not limited to chronic degenerativediseases (e.g., Alzheimer's disease, osteoarthritis, osteoporosis),metabolic diseases (e.g., diabetes, obesity), inflammatory diseases,cancer, cardiovascular (e.g., coronary artery disease, hypertension,congestive heart failure cardiomyopathy, chronic renal failure) andinfections (e.g., viral, bacterial, protazoan, and mechanisms of drugresistance). The assays are designed such that the same assay can beused in cells first with follow up in tissue biopsied from patients withthe disease. To identify potentially toxic molecules, necrosis assaysmay be performed on all molecules. The standard industry microtitreplates of 96 wells provide sufficient scale to conduct these phenotypicscreens though high throughput and ultra high throughput formats are notprecluded. Assays may be performed on cell lines, primary cell culture,tissue biopsies, tissue models, in vivo animal models, or otherorganisms. In a preferred embodiment, the bioassays are performed usinghuman cell lines and tissues. According to other embodiments, thebioassays may be performed using cells, tissues, organs or wholeorganisms of any species. Though ligands can be pooled in these assays,it is useful that each phenotypic assay be performed with one species ofmolecule per well to avoid agonist and antagonist interactions which maymask the phenotypic effect. The assays include but are not limited toallowing the diseased cell or tissue to enrich for genes which may berelevant to disease or a therapeutic response.

Although applications of the invention toward target identification incancer, diabetes and stimulation of cells with TGFβ are described in theexamples, the approach set forth above can be broadly applied to anydisease, cell stimulus, biological modulator or condition. Other assaysthan those described and those for other molecular pathways relevant todiseases can also be used. By taking this approach starting with genesup-regulated or down-regulated in diseased cells relative to normalcells or tissues or in cells in the presence of an agonist or antagonist(or partial of each) one is enriching for targets with specificity and agood therapeutic index. By crossing this specificity with molecularmechanisms in disease pathogenesis, one is enriching for targets whichmay be therapeutic. By sequentially combining a biochemical bindingassay which selects hits in a highly efficient manner from largelibraries and using these hits in a low throughput high qualityphenotypic bioassay reflective of the human disease, one can determinethe function of the gene.

5.2. Phenotype to Genotype

In an alternative series of embodiments, the present invention relatesto a method of screening a plurality of potential ligands in at leastone bioassay, selecting ligands which produce a change in phenotype in abioassay, and using the ligand to screen candidate targets to identifythe particular target(s) responsible for the altered phenotype. Invarious preferred embodiments, individual species of ligands areseparately screened in bioassay(s). A ligand which produces a change inphenotype in a bioassay may be exposed to a plurality of potentialtargets under conditions which permit ligand-target interaction. Invarious preferred embodiments of the invention, the target is a peptideor protein and each peptide or protein target is associated with apolynucleotide which encodes that target (e.g., by phage display or cellsurface display). Selected targets and their correspondingpolynucleotides are collected. The DNA sequence encoding targets whichare proteins may be sequenced, cloned, and validated. The differentialexpression of these targets may then be studied in human disease tissuebiopsies particularly where the molecular mechanism of the phenotype maybe phenotypically relevant. Similarly the ligand may be studied indiseased tissues and/or in vitro or in vivo models of these diseases.One embodiment is outlined in FIG. 2. As noted above, the embodimentslisted in sections 5.1.1 to 5.1.5 can be used in any of these methods.

High throughput phenotype cell based assays according to the inventiondiffer from high throughput screening methods as they are currentlypracticed. The typical high throughput screen is a mechanism based assaywhere the gene for a validated target is transfected into a cell linewith a reporter system (e.g., green fluorescent protein, luciferase,etc.) and members of a chemical library are screened for activation ofthe reporter. Instead of conducting this type of screen, the presentinvention focuses on looking for a significant change in phenotype incell lines without predetermining the molecular target in a bioassay.These bioassays are designed to look for ligands which modulate animportant biological stimulus or an important pathogenic mechanism.Non-limiting examples include apoptosis, proliferation, ischemia,necrosis, inflammation, fibrosis, invasion, angiogenesis, metabolism,infection and embryogenesis. In addition, individual pathways ofcellular stimuli with pluripotent effects can be blocked by antisense,translocating peptides, antibodies or other techniques to identifytargets which are more specific in their effect. In this way we achievean association of ligands from the library (as described above) with aphenotype in a bioassay. Assays for molecular mechanisms in diseaseincluding but not limited to those described above may be adapted tohigh throughput screening.

Although applications of the invention toward target identification incancer are discussed herein, the invention can be broadly applied to anydisease, cell stimulus or condition. Other assays than those describedrelated to biological stimuli and those for other molecular pathwaysrelevant to diseases or biology can also be used. By sequentiallycombining a bioassay in which a ligand is associated with a particularphenotypic change of interest and using these hits to select for thetarget in a protein or peptide display library, one can clone the genefor and identify the target. The differential expression of the targetin human disease tissue may then be studied. In addition, thespecificity of a ligand's effect in an in vitro or in vivo bioassay mayreveal the utility of that ligand in modulating a biological affect ortreating a particular disease.

5.3. Mapping Molecular Signaling Pathways

Once a number of genes have been shown to be involved in a particularmolecular pathway of disease pathogenesis the targets can be mappedwithin the molecular pathway relative to one another and to knownmembers of the pathway. The ligands binding to the different proteinsmay be derivatized with photoactivatable crosslinkers and used toposition each member in the pathway. For example, one member of apathway is first labeled (e.g., GFP). Next, members of the pathway areexposed to ligands derivatized with functional groups which may becrosslinked. Then, the mixture is exposed to the crosslinking stimulus.Lastly, the selected member of the pathway is collected using the label(e.g., GFP) and any compounds which have become associated with it areidentified. This may be repeated stepwise to identify earlier or laterpathway members. These methods have the advantage of not requiring theprior identification of the binding sites for the ligands or thedetermination of the secondary or tertiary structure of the targetmolecule prior to crosslinking.

Pathway members may then be used as targets in ligand screens. Bycomparing the phenotype of each ligand which selectively binds eachpathway member, positional information about each pathway memberrelative to others may be obtained. This information can be used tovalidate and select the best target for a given disease indication andeventually select the best therapy through pharmacogenetic baseddiagnosis.

5.4. Optimization of Leads

The present invention provides a method for optimizing leads andincreasing the hit ratio. The term “lead” as used herein refers to aligand with pharmaceutically desirable properties. Preferably themolecule would be considered a “small” molecule in the art, for examplehaving a molecular weight between 50 Da and 3000 Da. The method hasbroad application, but is particularly useful for obtaining ligandswhich interfere with protein-protein interactions.

Proteins usually have a distinct region on their surface or a regionburied deeper as a pocket, which displays increased affinity towardsbinding small molecules. These so called binding sites have relativelywell defined shape and size. Many drug molecules bind one at a time todifferent regions in the protein. To identify small, drug-like moleculeswhich bind to a target protein, one approach is to test the bindingcapabilities of “drug-like” molecules in a binding assay (see, forexample, Lipinski, et al., Adv. Drug Delivery Rev., 23, 3-25, 1997, forcharacteristics of “drug-like” molecules). For example, the bindingassays (e.g., size exclusion chromatography-based assays) describedherein are uniquely suitable to investigate a large number of smallmolecules under physiological conditions and rapidly identify reasonablystrong binders. The binding assays according to the invention can be anyassay which measures binding including, but are not limited to, sizeexclusion chromatography based assays, chip based assays, filter basedassays, array based assays, column based assays, filtration basedassays, and binding assays in solution or in solid phase. A preferredassay is one which can pull ligands from a mixture of compounds (e.g.,the size-exclusion chromatography based assay described herein) becauseof its highly parallel nature and ability to multiplex. The identifiedsmall molecules (“hits”) can be further optimized. In this case, any oneof the molecules are considered to be an early drug candidate. Many ofthese molecules have a molecular weight within, or close to, 200-500daltons. As described further below, combinatorial chemistry using thehits from the binding assay can be used to react two or more moleculesto generate a product with higher affinity for the target protein (FIG.23). The binding assay can then be repeated using concentrations ofreagents designed to identify ligands (e.g., products from thecombinatorial chemistry reactions) with higher and higher affinity. Forexample, while the first hit may have a K_(d) in the micromolar range,the optimized lead may be selected for an affinity with a K_(d) in thenanomolar or higher affinity range. In one preferred embodiment, mixedcombinatorial chemistry in solution phase is performed. The methodovercomes a common bottleneck in combinatorial chemistry: thepurification of individual compounds from mixtures and culling is notneeded because a target molecule (e.g., a target protein) can be used topurify the high affinity binders from a mixture of compounds.

Because a large number of chemical leads may be characterized at thebiochemical and phenotypic levels, a structure activity relationship maybe established to serve as a basis for lead optimization. If moleculeswith similar activities are identified, the structure activityrelationship (SAR) can be determined. A target directed synthesistechnology can be employed to crosslink molecules binding close to eachother indicating if their activity is mediated through the same activesubsite on the protein or through different subsites on the proteintarget. In one embodiment, one of the molecules contains aphotactivatable crosslinker, or one molecule contains a reactive groupthat is reactive with a group on a second molecule. In this wayadditional different functional subsites on the target can be mapped anddifferent mechanisms can be interpreted from the phenotypic findingswith molecules binding to those subsites (e.g., agonist vs. antagonist).Photoactivatable crosslinkers on one of the functional groups of theligand scaffold may be used to link ligands bound to the target thususing the target molecule as a template.

This different approach to lead optimization and synthesis is based onthe fact that the majority of drug and drug-like molecules—due to theirsize which matches the size and shape of a binding site in the targetwhile also taking advantage of its polarity and chargedistribution—consist of two or more smaller subunits that have amolecular weight about from a third to a half of a drug or drug-likemolecule. In some embodiments, these subunits have a molecular weight ofless than 1,000, 500, or 200 daltons, may or may not contain an entiredrug-like scaffold, and bind with μM or less than μM K_(d) affinity.These subunits act as building blocks that are connected either directlyor through a linker to form a larger but still drug-like molecule. Alsothese individual building blocks usually bind to the binding site with adecreased but still useful affinity. Alternatively, building blocks maybe included which impart different characteristics important to leadoptimization (e.g., solubility, membrane crossing, bioavailability,and/or up or down-regulation of metabolism). The size exclusionchromatography based assay technology offers a unique opportunity toidentify two or more such small “building blocks” from mixturescontaining hundreds or thousands of individual subunit molecules. Two ormore building blocks may bind together to one protein binding site.

The protein's ability to select suitable small molecule pairs ortriplets can be exploited for small molecules that contain reactivegroups and for small molecules that do not contain reactive groups. Inthe former case, the building blocks have complimentary reactive groupson them or attached to them with linkers (FIG. 24). As the smallmolecules are organized and selected by the protein's binding site, theyare located next to each other. The close proximity of those reactivegroups may trigger the “coupling” reaction in which the small moleculesreact to form a product. The resulting products with the highestaffinity for the protein may be identified using the size exclusionchromatography assay or any other described herein together with highresolution LCMS. In this case, a mixture of the building blockspreferably has the reactive groups in different orientations on them.Preferably, a combinatorial library of such variations are tested. Thismethod may take advantage of, for example, condensation, substitution,and addition reactions. Any reaction is suitable which uses moderatelyreactive groups, so that they wouldn't react or react only very slowlyin absence of the protein’ binding site which orients them in very closeproximity to each other. This slow kinetics of the reactions can beaccelerated if the reactants are close to each other. The protein'sbinding site is providing a “template” for the different small fragmentsto find their pairs and “self assemble.” In some embodiments, the smallmolecules react in the presence of the protein but not in the absence ofthe protein. In certain embodiments, the amount of product formed in thepresence of the protein is at least 2, 5, 10, 20, 30, or 50-fold morethan the amount of product formed in the absence of the protein. Inpreferred embodiments, the crosslinking or reaction occurs underphysiological conditions.

In other embodiments, the building blocks are relatively non-reactivestable, small molecules including, but not limited to, smallheterocycles, amino acids, carbohydrates, aromatic rings, ureas,thioureas, guanidines, amines, acids, sulfones, sulfoxides, or any smallmolecule subunit of any drug or drug-like molecule (FIG. 25). Thesesmall molecules may bind as described above in pairs or triplets and theresulting products with high affinity for the protein may be identifiedusing the size exclusion chromatography based binding assay. Thefunctionalized distinct pairs or triplets than may serve as scaffoldsfor solid or solution phase combinatorial chemistry, and a number ofcombinations may be synthesized and retested. The strong binders thenmay be used as staring scaffolds for further combinatorial chemistry,using the targeted protein as the main directing force for selectingpromising small molecules. In some embodiments, the small moleculesreact in the presence of the protein but not in the absence of theprotein. In certain embodiments, the amount of product formed in thepresence of the protein is at least 2, 5, 10, 20, 30, or 50-fold morethan the amount of product formed in the absence of the protein. Inpreferred embodiments, small molecule fragments are minimallysubstituted (e.g., substituted with a group that causes only a smallchange in molecular weight), but “meaningfully” substituted (e.g.,substituted with a group that improves binding affinity for the target,improves solubility or biodistribution, or increases reactivity withother subunits) for applications without linkers. These moleculesinclude, but are not limited to, the following small ring systems:hexoses, pentoses, and

R, R′ are the same or different, and are selected from the groupconsisting of hydrogen, small alkyl, alkenyl, hydroxyalkyl, aminoalkyl,chloroalkyl, alkoxy, acyl, carbonyl (where appropriate), carboxyl, andvariations thereof.X,Y,Z: are the same or different, and are selected from the groupconsisting of C, O, S, and NAcyclic small molecules may be selected from any class as long as theydon't contain pharmacologically unacceptable groups (e.g., formylhalide, acyl halide, reactive formyl, aliphatic ketones,1,2-dicarbonyls, haloketones, phosphonate esters, sufonateesters/halides, thiols, or vinyl-ketones). Some examples ofpharmacologically acceptable small molecules include, but are notlimited to, substituted ureas, thioureas, guanidines, alcohols, ethers,amines, amides, oximes, hydrazones, esters, carboxylic acids, andnitrites.

In another exemplary method, small molecule A and small molecule B canbe mixed alone or in the presence of other nonbonding small moleculeswith the target (s) and a bifunctional crosslinker capable of reactingwith both A and B in which one functional group is protected and theother is free. Alternatively, A can be reacted with a crosslinker, andthe resulting product can be reacted with B. Functional groups caninclude any reactive group, including, but not limited to, amine,carboxylic acid, nitrile, and halides. The same or different functionalgroups can be on A or B. In one example of a pair of small molecules Aand B that can react with each other, A contains an amine functionalgroup, and B contains a crosslinker with a carboxylic acid, an activatedester, and anhydride, an acylhalide, or any other group which can reactwith the amide in an acylation or an alkylation reaction. Linkers caninclude a molecule which only contains two functional groups or containsa component in between the functional groups including, but not limitedto, polyethylene glycol. Exemplary protective groups include amineprotecting groups such as BOC, FMOC, or benzyl. The CBZ protecting groupcan be used to protect carboxylic acids benzylester, allylester, andnitrites. In one embodiment, protective groups are photoactivated todeprotect a functional group, such as Nitrobenzyl or azo groups. Inanother embodiment, linkers containing functional groups which do notreact with proteins and compounds which do not contain the functionalgroups on proteins (i.e., amines, carboxylic acids, alcohol, and SHgroups) are used. In an example, the compound contains or is modified tocontain a halide (e.g., Cl). A linker containing double bonds, triplebonds, halides, or aromatic groups can then be linked to the compoundthrough a Heck coupling reaction or a Suzuki reaction resulting in alinkage of the linker with the compound without reacting with theprotein. Such chemical compounds are available from Aldrich. Linkers andprotective groups for the above reactions are available from AdvancedChemtech and Novobiochem among others. This linking may increase theaffinity of binding to the target in a preferred embodiment between 2and 100 fold or more. Thus, a superior lead with higher affinity can beobtained. This approach can also be used to further enhance thestructural diversity of a chemical library in a target directed andbiologically relevant way.

These methods are enabled by the high-throughout binding assaysdescribed herein. These methods are suitable to rapidly identify smallfragments of a drug candidate from a mixture of a large number of othernon-binders. The identification, assembly, and further optimization ofsmall molecules towards drug candidates is directed and controlled bythe protein. The whole “auto-selection” process is taking place underphysiological conditions, thus approximating very close conditions inthe protein's natural structure and environment. The assay techniqueenables a very fast progress and can use widely diverse mixtures ofsmall building blocks. In some embodiments, information about anidentified subunit or a product of two or more subunits is used in thesynthesis of other compounds.

In other methods, the hits (i.e., the identified ligands with affinityfor a target molecule) from binding assays described herein are reactedin the absence of the target molecule to generate products that containmoieties from two or more ligands. If desired, the rate of reactionbetween the hits can be increased by means including, but not limitedto, altering the pH, solvent, catalyst concentration, or temperature ofthe reaction mixture. The resulting products may then be applied to abinding assay described herein to identify the products with the highestaffinity for the target molecule.

5.5. De Novo Synthesis of Leads

Methods identical to those described in the Section 5.4 can be used forthe de novo synthesis of lead compounds that bind a target molecule. Inparticular, compounds such as small molecules from a library can bereacted in the presence of a target molecule, such as a target protein.The target molecule promotes or catalyzes the reaction of smallmolecules that bind the target to generate products with higher affinityfor the target. In some embodiments, the small molecules react in thepresence of the protein but not in the absence of the protein. Incertain embodiments, the amount of product formed in the presence of theprotein is at least 2, 5, 10, 20, 30, or 50-fold more than the amount ofproduct formed in the absence of the protein. In other embodiments, thetarget isolates fragments which can be reacted in the absence of thetarget to form additional lead compounds. The resulting products can beapplied, without prior purification, to a binding assay described hereinto identify the products with the greatest affinity for the target.Thus, the assay can serve, and enables the target to serve, as theuniversal director of combinatorial synthesis.

6. GENOTYPE TO PHENOTYPE 6.1. Example 1 Breast Cancer

6.1.1. Targets

A biopsy is first collected from at least one breast cancer patient.Laser capture microdissection and ANRNA or RT PCR may be used inconjunction with microarray analysis to isolate genes which aredifferentially expressed in the cancerous cells. For example, thesetechniques may be used to identify transcripts which are present incancer cells at levels more than 2-fold higher than non-cancerous cellsin the same biopsy. Alternatively, the genes may be overexpressed innon-cancerous cells. Genes may further be selected for those which areexpressed at such levels in a significant fraction of patients tested.

Tissue may be embedded in Tissue Tek OCT medium (VWR), frozen in liquidnitrogen, and sectioned in a cryostat. Sections may be mounted onuncoated glass slides and stored at −80° C. Slides may be fixed in 70%ethanol for 30 s, stained with H&E followed by 5 s dehydration steps in70%, 95%, and 100% and a 5 min dehydration step in xylene. After airdrying, the sections may be laser microdissected using the PixCell I andII LCM system (Arcturus Engineering). 5×10⁴ each of morphologicallynormal breast epithelial cells, malignant invasive breast carcinomacells and malignant metastatic breast carcinoma cells (e.g., from theaxillary lymph node) may be captured. The total RNA may be isolated fromeach of these cell populations by transferring a transfer film withadherent cells into guanidinium isothyocyanate at room temperature,extracting with phenol/chloroform/isoamyl alcohol, and precipitatingwith sodium acetate and 10 μg/μL glycogen in isopropanol. The RNA pelletmay then be resuspended and treated with 10 units DNase (Gene Hunter) inthe presence of RNASE inhibitor (Life Technologies) for 2 hours at 37°C. Following reextraction and precipitation, the pellet may beresuspended in 27 μL of RNASE free water. ANRNA or RT PCR may beperformed followed by sequencing. Sequences identified by this techniquewhich are EST's may be used to select a full length cDNA from a cDNAlibrary (CLONTECH). These cDNA's may be enriched in diseased but notnormal cells/tissues but their function may be unknown.

Selected cDNA's may be each tagged with hexahistidine (6his) inserted atthe carboxy terminal end and glutathione synthetase (GST) at the aminoterminal end of the gene each with a protease cleavage site. These genesmay be cloned into a Drosophila expression system vector with the bipprotein leader, co-transfected with hygromicin vector into Drosophilausing CaPO₄. Cells may be maintained in selective media and geneexpression may be induced with copper sulfate (Invitrogen). After 48hours, supernatant containing 5-10 mg/L of each protein may becollected. The resulting proteins may then be purified from thesupernatant by Ni(2+)-NTA chromatography, as a first purification step,and glutathione affinity chromatography, as a second step, followed byspecific protease removal by cleavage of the tags. Up to milligramquantities of each protein may be recovered.

6.1.2. Binding, Ligand-Target Pair Selection, and Ligand Identification

Diverse chemical, natural product-like and peptide combinatoriallibraries containing up to 2 million ligands may be synthesized in apooled fashion in fluid phase. In addition, natural product libraries(Terragen, Yonsei), and chemical libraries (Arqule, Coelocath) may bepurchased. From 1,000 to 10,000 ligands may be mixed together with 1 μgof protein in a volume of up to 100 μL to have a 1 μM concentration inthe well of a 96 well plate. After a 30 minute incubation on ice, thesamples may be loaded into 96 well plates with cartridges to serve asHPLC columns for each well (Waters 2790 HPLC). The firstcartridge/column may be a size exclusion resin (G25 Pharmacia) to holdthe unbound molecules in the resin but allow the bound ligand andprotein to pass through. A small and narrow column (e.g., 2 mm length×5mm diameter Rocket Column, Biorad) is used to minimize dilution at thisstep. The next cartridge/column used is a hydrophobic or hydrophilicreverse phase HPLC resin, the choice of which depends upon thehydrophobicity of the ligand library being used. For example, ahydrophobic C18 silica column may be used with less hydrophobic ligands,while a hydrophilic C8 column may be used for more hydrophilic ligands.Another example is the SB8U column from Agilant which may be used foreither hydrophilic or hydrophobic ligands. The reverse phase HPLC mayconcentrate the small molecules and protein by allowing them to bindonto the resin after which the small molecules may be eluted from theprotein and the resin. The eluants containing the small molecules may becollected in a 96 well plate. These eluants may then be transferred tothe mass spectrometer (Micromass Quattro LC) and the spectra determinedusing the MassLynx, MAxENT software (Micromass). In this waytheoretically up to 100 ligands per protein may be deconvoluted suchthat the exact member of the library may be identified except forchirality. Specifically, mass spectroscopy can be used to detectisotopes of compounds or fragmentation patterns any of which can be usedas an alternative or in combination with true molecular weight toidentify a compound. In addition, IR or FTIR analysis may be performedto identify ligand functional groups or units. Each ligand may then besynthesized or a larger scale. Peptide ligands may be fused with the TATtransducing sequence.

The affinity of the ligands identified will depend in part on theconcentration of the library used in the screen, but should range fromat least nanomolar to micromolar. The actual affinity of each ligand maybe determined by competition studies. These ligands may then be testedin bioassays.

6.1.3. Bioassays

Where the cDNAs are selected based on their differential expression incancer cells, the ligands may be tested in assays which detect ormeasure apoptosis, proliferation, necrosis, angiogenesis, inflammation,or metastatic tumor invasion. According to the invention, assays aredesigned using models which are as close to the human disease aspossible (e.g., pathological tissue biopsies, in vitro tissue models, invitro disease models, human cell lines) and which are based upon celllines and are easily applied to primary tissue from human pathologysamples. These assays may be developed using tissue from mice transgenicfor a gene known to be involved in cancer, bcl-2. Human breast cancercell lines which may be assayed include: MCF-7, NCI/ADR HS578T,MDA-MB-22231/ATCC, MDA-MB-4335, MDA-N, BT-549, T-47D (NCI, ATCC). Othercell lines and tissues may also be used. Non-limiting examples ofbioassays are shown in Table 1. TABLE 1 Bioassays in cell lines, humantissue biopsies, and human tissue biopsies transplanted into host (e.g.,nude mouse). Pathogenic Bioassay [in breast, colon, lung, and prostatecell lines (e.g., breast cancer, Mechanism MCF-7, NCI/ADR HS578T,MDA-MB-22231/ATCC, MDA-MB-4335, MDA-N, BT-549, T-47D Apoptosis 1.5 hourin vitro incubation with ligand then stain with FITC Annexin V; DAPIstain nuclear morphology confirmation. Necrosis 8 hour incubation withligand (in nude mouse); vital dye stain with propidium iodide or TOTO-3,confirm with MTT assay. Proliferation 2 hour incubation with ligand thenstain with FITC anti-PCNA; confirm with BRDU. Angiogenesis Incubatetumor in nude mouse with ligand, stain with fluorescein factor VIIIrelated antigen to measure endothelial cell density; confirm inmigration of cultured human dermal microvasculature endothelial cellstowards β-FGF. Inflamation 2 hour incubation with ligand and measureTNF, INF, IL-4, IL-2, IL-10, TGFβ, VCAM, NκFB via ELISA. Invasion 30hour incubation of cells labeled with CSFE dye in matrigel cell invasionchamber; confirm by study in nude mice. Fibrosis 48 hour incubation withligand followed by fibronectin ELISA assay or immunohistochemistry.Metabolism 2 hour incubation with insulin and ligand then measureglucose levels; test in 3T3- L1 adipocyte and L6 monocyte cell linesfollowed by type II diabetes compared to normal patient fat biopsies.Development/ Incubate ligand with either MHC class II-negative cells orsingle pluripotent Differentiation ML-IC cells and assess cell fate bycytological and immunologal techniques according to either Inaba K etal., 1993, PNAS 90: 3038 or Punzel M et al., 1999, Blood 93: 3750.

6.1.3.1. Apoptosis

Apoptosis may be assayed using a cell membrane phosphatidyl serinebinding dye (FITC Annexin V; alternative dyes such as Cy5.5 may also beused). Selected ligands for each of the proteins identified in thebinding assay may be tested for an effect on apoptosis on various celllines. From 2×10⁵ to 2×10⁸ cells may be plated in each well of a 96 wellplate and medium containing 1 μM to 10 μM of each ligand is added towells in triplicate. Minimally, a negative (no ligands) and a positive(bcl2 reactive ligand) control are also performed. After 1.5 hours, FITCAnnexin is added to the wells, incubated with the cells for 15 minutesand, after 3 washing steps, the level of fluorescence is determinedusing a plate reader.

The assays may be demonstrated to be transferable from cells to tissuesby using bcl-2 expressing cells and tissues from bcl-2 transgenic mice(Charles River). Ligands which induce apoptosis may be tested on freshtumor biopsies from breast cancer patients. One advantage of usingprimary tissue biopsy is that the assay may be performed within twohours of tissue collection, i.e. before the tissue has begun showing thechanges associated with ischemia. Small pieces of tumor biopsy may beplated in wells of a 96 well plate and the same assay as above isrepeated with each sample in duplicate. After, the fluorescence is read,the samples may be stained with DAPI staining (Molecular Probes, EugeneOreg.) and nuclear morphology may be assessed under a fluorescencemicroscope for nuclear condensation and fragmentation for confirmation.Alternatively, the classic TUNEL (terminal deoxynucleotidyl transferasemediated biotinylated deoxyuridine triphosphate nick end labeling)method to label DNA strand breaks may be used.

6.1.3.2. Proliferation

Cell proliferation may be assayed by exposing cells to a fluoresceinlabeled anti-PCNA antibody (e.g., PC-10, Santa Cruz Biotechnology) whichbinds to proliferating cell nuclear antigen (PCNA). Selected ligands foreach of the proteins identified in the binding assay may be tested foran effect on proliferation on cell lines. From 2×10⁵ to 2×10⁸ cells maybe plated in each well of a 96 well plate. Medium containing 1 μM to 10μM of each ligand may then be added to wells in triplicate. Minimally, anegative (no ligands) and a positive control are also performed. After 2hours, FITC anti-PCNA may be added to the wells, incubated with thecells for 15 minutes and, after 3 washing steps, the level offluorescence may be determined using a plate reader. The PCNA assay hasalready been used in cells and in tissues (Kulldorff M et. al., 2000, J.Clin Epidemiology 53:875). Ligands which inhibit proliferation may betested on fresh tumor biopsies from breast cancer patients. Small piecesof tumor biopsy may be plated in wells of a 96 well plate and the sameassay as above repeated with each sample in duplicate. After thefluorescence is read, the samples may be assessed under a fluorescencemicroscope to confirm that the cells whose proliferation indeed is beingaffected are the cancer cells.

In a second approach cell proliferation is classically measured lookingat BRDU or ³H-thymidine uptake. According to a third approach, cells maybe labeled with the CSFE dye (5-and-6 carboxyfluorescein diacetatesuccinimidyl ester). As the cells proliferate over 7 to 8 generations,the dye is diluted. A fourth approach uses a fluorescence-based AttoPhosassay to measure endogenous enzyme acid phosphatase may be used tomeasure cell numbers. Other methods for detecting cells undergoingproliferation may be used, including 7-ADD (7-amino-actinomycin-D) whichis used to determine the stage of proliferation or by staining with theKi67 antibody.

6.1.3.3. Necrosis

Techniques to detect necrosis include but are not limited to the classictechniques of DNA binding dyes such as propidium iodide or TOTO-3.Alternatively, a colorimetric methylthiazole tetrazolium (MTT) assay forthe mitochondrial enzyme release can also be used to determine cellviability. In a preferred embodiment of the invention, cell viability isdetermined using the DNA binding dyes propidium iodide and TOTO-3.Conducting these assays in cell lines may enable one to distinguishbetween necrosis and apoptosis which will facilitate distinguishingligands have specific effects from ligands which are broadly cytotoxic.This distinction may also be facilitated by performing necrosis andapoptosis assays in parallel. Selected ligands for each of the targetsidentified in the binding assay may be tested for an effect on necrosisof the cell lines. From 2×10⁵ to 2×10⁸ cells may be plated in each wellof a 96 well plate and medium containing 1 μM to 10 μM of each ligand isadded to wells in triplicate. Minimally, a negative (no ligands) and apositive control are also performed. After 8 hours, propidium iodide orTOTO 3 is added to the wells, incubated with the cells for 15 minutesand after 3 washing steps, the level of fluorescence is determined usinga fluorescent plate reader.

Necrosis may be a difficult assay to transfer to tissue biopsies becauseit is generally assayed after at least 8 hours and there is a lot ofnecrosis due to ischemia in tissue biopsies after such an intervalproviding a high background. To overcome this problem, human biopsytissue may be transplanted into nude mice, thereby preventing ischemiainduced necrosis during the 8 hour assay period. To insure that growthin the nude mouse does not alter the tumor, a tumor, grown in a nudemouse for 1 month, may be explanted and tested in the short termapoptosis and proliferation as outlined above. The tumor may also beviewed histologically and compared with the fresh tumor explant toassess differences. The ligands which bind to the same target and inducenecrosis in 50% of the cases may be injected into the tumor in theanimal, collected after 8 hours, and stained with propidium iodide.Histological examination may reveal that the tumor cells are undergoingnecrosis while the other cells in the biopsy are not.

6.1.3.4. Angiogenesis

The in vitro assay used to test for a pro or anti-angiogenic effectassays the migration of cultured human dermal microvascular endothelialcells towards β-FGF or bovine serum albumin (negative control) withincreasing concentrations of angiostatin as an inhibitory control andincreasing concentrations of the ligands in different wells (Clonetics,San Diego; Polyerini P J et. al., 1991, Methods in Enzymology 198: 440).Angiogenesis is also a longer term event so modeling in human biopsieswill absolutely require growth in nude mice. Should ligands with ananti-angiogeneic activity be discovered in the future, they may beassayed by daily injection into the tumor for 3 to 5 days and subsequentremoval and staining with Fluorescent anti-Factor VIII related antigento measure endothelial cell density.

Other models for angiogenesis are contemplated by the invention. In vivomodels include implantation of hydron pellets with the test molecules onthem implanted into the avascular rat cornea (cornea micropocket assay).Growth of vessels from the limbus to towards the pellet at 7 days isscored as a positive response which can be negated by the removal of theangiogenic or anti-angiogeneic protein by antibody on protein A beads(Poverini P J et. al., 1991, Methods in Enzymology 198: 440). Thesevessels can be characterized as to the density, length and luminal sizesof the vessels. A similar assay can also be performed in the mouse eye(L Smith, Children's Hospital, Boston). Angiogenic molecules can also betested in vivo in the rabbit model of hindlimb ischemia (Shyu K G et.al., 1998 Circulation 98:2081). Other in vitro tissue modeling systemsinclude endothelial cells in 3 dimensional culture where they formtubular structures that resemble immature capillaries (Springhorn et.al., 1995, In vitro Cell Dev Biol Anim 31, 473; Sierra-Honigmann M R et.al., 1998, Science 281:1683). Smooth muscle cell recruitment can bemeasured using anti-smooth muscle actin immunohistochemistry.

6.1.3.5. Invasion

Tumor invasion may be assayed using the a basement membrane cellinvasion chamber which is a chamber coated with Matrigel extracellularmatrix. The matrix coats the wells used to separate one chamber from theother in 24 well plates (Becton Dickinson Labware). Selected ligands foreach of the proteins identified in the binding assay may be tested foran effect on invasion on the cell lines. Cells labeled with CSFE dye canbe measured by FACS or used to follow cell fate in vivo. Alternatively,cells may be labeled with ³H-thymidine or another marker. About 2×10⁵labeled cells may be plated in each well and medium containing 1 μM or10 μM of each ligand is added to the top half of the wells intriplicate. After 30 hours in a CO₂ incubator, the membrane chambers maybe rinsed 3 times on both sides with DMEM/0.1% BSA and the top surfaceis scrubbed with a cotton swab. The amount of dye present in the bottomwell may be determined using a fluorescent plate reader. In positivewells, the membrane can be cut out and the number of cells on the bottomcan be counted. Ligands affecting tumor invasion in this in vitro assaymay be further tested in vivo by histological analysis of human tumorbiopsies in nude mice.

6.1.3.6. Development and/or Differentiation

Various assays to test the effect of a ligand on the development and/ordifferentiation of cells, tissues, organs and organisms arecontemplated. Non-limiting examples include incubating a ligand witheither major histocompatibility complex (MHC) class II-negative cells orsingle pluripotent myeloid-lymphoid initiating cells (ML-IC) andassessing cell fate by cytological and immunologal techniques accordingto either Inaba K et al., 1993, PNAS 90:3038 or Punzel M et al., 1999,Blood 93:3750.

6.2. Example 2 Diabetes

Peripheral insulin resistance is the major pathogenic mechanism whichcauses type II diabetes, the fourth leading cause of death by diseaseand is the leading cause of blindness, renal failure and amputation.Insulin stimulates glucose uptake in muscle and fat cells, glycogensynthesis in liver and muscle cells and fat synthesis in fat and livercells and the inhibition of glucose production in liver cells. NIDDM ischaracterized by impaired insulin-stimulated glucose uptake intoskeletal muscle and adipocytes, impaired inhibition of livergluconeogenesis and potentially misregulated insulin secretion. Thepathway is only partially understood and the molecules responsible forperipheral insulin resistance are not known making it amenable to themethods of the instant invention.

Insulin binds to the α subunit of its dimeric receptor inducing thereceptor's cytosolic β subunit tyrosine kinase activity to phosphorylateitself and nearby proteins. Insulin triggers activation of DNA andprotein synthesis, activation of anabolic metabolic pathways andinhibition of catabolic metabolic pathways. A series of proteins IRS-1,IRS-2, IRS-3, IRS-4, Gab-1 and p62 dok proteins all can bind thephosphorylated insulin receptor and can be substrates for it. IRS-1appears to be most involved with the receptor but all of these areactivators of phosphatidylinositol 3 kinase, which causes the transportof the striated muscle/adipose tissue specific glucose transporter GLUT4 from the golgi in the cytoplasm to the plasma membrane where ittransports glucose which is then phosphorylated by hexokinase. (Glut 2is present on liver and β cells of pancreas). Insulin also up regulatesglycogen synthase which catalyzes the final step of the conversion ofglucose into glycogen but it is believed that the defect occurs in thefirst half of this signaling pathway.

The liver and the muscle account for most of the glucose metabolized andhence cells from these organs will be used in these studies. Diabeticpatient muscle biopsies may be challenged with insulin and/orgliclazides as may be muscle biopsies from healthy individuals. Theindividuals may be relatives of the patients, some of whom have no overtsymptoms of diabetes and a completely normal response to insulin.Defects in insulin action precede overt disease and are seen innondiabetic relatives of diabetic patients. Differential display cDNAlibraries may be prepared from diabetic patients and healthyindividuals. A second differential display cDNA libraries may beprepared from patient biopsies challenged with insulin and/orgliclazides and biopsies from healthy patients. These cDNA libraries maythen be expressed as proteins. Ligands which bind the expressed proteinsmay be isolated using the methods described in the invention (e.g.,HPLC/mass spectroscopy).

The ligands may be assayed for the effect on glucose uptake followinginsulin stimulation. 3T3-L1 adipocyte and L6 myocyte cell lines (ATCC)may be used as cell models for glucose metabolism. From 2×10⁸ to 1×10¹⁰cells may be plated in each well of a 96 well plate and mediumcontaining a known concentration of glucose and 1 μM to 10 μM of eachligand is added to wells in triplicate. Minimally, a negative (noinsulin, no ligands) and a positive (insulin, no ligands) control areperformed. Insulin is next added to the wells at a low and a highconcentration. After 2 hours incubation in a CO₂ incubator, glucoselevels may be determined using a glucose meter. The ligands whichaffected glucose metabolism following insulin stimulation in the celllines may then be tested using the same assay with fresh skeletal muscleand adipose tissue biopsy from Type II diabetic patients. Cellssuspended from the tissue biopsy may be plated at the same density inwells of a 96 well plate and the same assay as above repeated with eachsample in duplicate. If the ligands decreased peripheral insulinresistance in these tissue biopsies, the ligand gene combination mayrepresent a validated target in the treatment of peripheral insulinresistance which may be tested further and mapped in the metabolicsignaling pathway of insulin.

6.3. Identification of Targets in Molecular Pathways of Known Genes

The approach used above may be used to identify and determine thefunction of unknown genes within the signaling pathways of pluripotentsecreted proteins and to isolate the therapeutic effect from the toxiceffect in a tissue specific way. TGFβ1 is a well known potent growthinhibitor in many cell types and the type II TGFβ receptor, Smad 2 orSmad 4 are known to be mutated in a number of cancers (Kim S J, 2000,Cytokine Growth Factor Rev. 11: 159). Some tumor suppressor genes (DPC4)are members of this SMAD family and are potent down regulators of T cellimmune responses (Prud'homme G J, 2000, J. Autoimmun. 14:23). Modulationof this growth inhibition and apoptosis induction pathway may be used todevelop novel therapies to inhibit cancer cell growth, induce toleranceof T cells in autoimmunity and break tolerance to cancer antigens byblockade of this TGFβ pathway.

One of the limiting factors has been that TGFβ1 also induces deposit ofthe extracellular matrix including up regulation of fibronectin,collagen, plaminogen activator inhibitor-1 and tissue inhibitors ofmatrix metalloproteases while down regulating matrix degrading proteasessuch as interstitial collagenase. Massague, 1990, J Ann Rev Biochem6:597. Overproduction of matrix components is the major finding intissue fibrosis an important cause of end stage renal and other diseases(Blobe G C, 2000, NEJM 342:1350). Decreased fibronectin production isoften observed in cancer causing decreased cellular adhesion andincreased metastasis (Kornblihtt et al., 1996, FASEB J 10:248). TGFβinduces these effects on ECM through a Smad independent pathway in whichc-jun N-terminal kinase (JNK; a member of the MAP kinase family)activated to modulate cJUN (member of the AP-1 family of transcriptionfactors) and ATF-2 (another transcription factor) (Hocevar et al., 1999,EMBO J. 18:1345). The pluripotent effects of TGFβ may be dissected outby targeting jun and smad pathways separately.

To this end, primary human T cells and fibroblasts may be split into twoand half of the cells may be transfected with a retroviral vectorcontaining antisense jun or SMAD. Alternatively this may be achievedwith a different vector or the cells may be transduced with a peptidereactive with either smad or jun. The resulting cell lines may then bestimulated with TGFβ and cDNA's may be cloned which may bedifferentially expressed between stimulated and unstimulated cells andthen cells with either pathway blocked using microarray analysis orother techniques of differential expression. Once cDNAs have beenidentified the expression of which is only associated with one of thepathways (but the function of which is unknown), these cDNAs can then beexpressed as proteins, ligands binding to them can be isolated using thebiochemical binding assay and resolution by HPLC and mass spectroscopy.The ligands can then be tested for the ability to block or induce eitherproliferation (in a PCNA based assay as described above) or secretion ofthe extracellular matrix. The extracellular matrix assay would measurefibronectin deposition, a major component of the extracellular matrixover a 48 hour period in a 96 well plate using an ELISA assay forfibronectin. In this way, genes can be identified and targets can bevalidated which are associated with the antiproliferative effect of theprotein but not the profibrotic effect and visa versa. A similarapproach may be used to look at any stimulus to a cells or tissue toidentify new members of the molecular pathway and validate them as drugtargets.

7.1. Phenotype to Genotype

7.1.1. Phenotype Detection

Tumor cell apoptosis and proliferation assays described in Sections6.1.3.1 and 6.1.3.2. may be adapted to high throughput screening using,for example, a 384 well plate format (Applied Biosystems FMAT 8100).Apoptosis and necrosis may be assayed simultaneously. For apoptosis andnecrosis the Cy5.5 Annexin V assay and TOTO 3 reagents respectively maybe used (Applied Biosystems). Cy5.5 labeled anti-PCNA antibody (PC-10,Santa Cruz Biotechnology) may be used to assay cell proliferation.Non-limiting examples of human breast cancer cell lines which may beassayed include: MCF-7, NCI/ADR HS578T, MDA-MB-22231/ATCC, MDA-MB-4335,MDA-N, BT-549, T-47D (NCI, ATCC). Non-limiting examples of humanprostate cancer cell lines which may be assayed include: DU-145, PC-3,LNCaP. Non-limiting examples of human colon cancer cell lines which maybe assayed include: COLO 205, HCC-2998, HCT-15, HCT-116, HT29, KM12,SW-620. Non-limiting examples of human lung cancer cell lines which maybe assayed include: A549/ATCC, EKVX, HOP-62, HOP-92, NCI-H23, NCI-H226,NCI-H322M, NCI-H460, NCI-H522. From 1×10⁵ to 1×10⁸ cells may be platedin each well of a 384 well plate. Medium containing 1 pM to 1 M andpreferably 1 μM to 10 μM of each potential ligand in a ligand library(non-limiting examples of which are listed in section 5.1.2 above) isadded to wells are tested in triplicate. Negative (no ligands) andpositive (staurosporine) controls are included. The ligands having thephenotypic effect at a concentration of ≦20 μM and are good candidatesfor target identification according to the invention.

7.1.2. Target Identification

An important advantage of the invention is that, unlike the prior art,the target of a ligand which is found to have an affect in one or morebioassays, may be identified using the ligand. There are a number ofapproaches which may be used to identify the target according to theinvention.

In a first series of embodiments, a potential target is a proteindisplayed on the surface of a cell. According to one non-limitingexample, a full-length human cDNA library is expressed in the pDisplayvector (Invitrogen). This vector targets the protein to and anchors itin the cell membrane on the surface of eukaryotic cells. In anothernon-limiting embodiment of the invention, a full-length human cDNAlibrary is expressed in the pYD1 yeast display vector or similar vectortransfected into the EBY100 Saccharomyces cerevisiae strain(Invitrogen). In still another non-limiting embodiment of the invention,a full-length human cDNA library is expressed on the surface of insectcells using baculovirus vector (Ernst W et. al. 1998, Nucleic AcidsResearch 26:1718). These systems allow full-length proteins to beexpressed on the surface as opposed to prokaryotic systems which onlyallow peptides to be expressed.

In alternative embodiments, a polynucleotide library can be expressed asa peptide alone or a fusion on the surface of a cell or a virus (e.g.,bacteriophage, T7, or M13). Non-limiting examples include apolynucleotide library generated from human or infectious agent. In aspecific embodiment of the invention, a cDNA library is expressed asdodecapeptides in the pFliTrx vector (Invitrogen) or similar. Accordingto this embodiment when the vector is expressed in E. coli, the peptideis displayed in the active site loop of the thioredoxin protein andinside the bacterial flagellin gene. In another embodiment of theinvention, potential targets may be displayed as peptides on a ribosomedisplay system in which the peptide is fused to the RNA encoding it bytreatment with puromycin (Roberts R W et al., 1977, PNAS 94:12297). Allother display systems (including but not limited to retrovirus,adenovirus) may be used in accordance with the invention to displaycDNAs or peptides.

7.1.3. Separation

Potential targets displayed by any of the above methods may be exposedto the ligand. The ligand may be either immobilized on a surface, beador column or it may be in solution depending on the separation method tobe used. In a first embodiment of the invention, the ligands may bedirectly immobilized on the surface, directly labeled or detected. In asecond embodiment of the invention, the ligands may be derivatized withan affinity label to facilitate collection of the ligand-target pairwhere the target is displayed as illustrated in the foregoing examples.Non-limiting examples of such affinity labels include biotin,digoxygenin, or an antibody. Displayed targets which bind the ligand maythen be separated from those which do not bind and the sequence encodingthe target is identified by standard cloning and DNA sequencingtechniques.

In a first embodiment of the invention, cells can be “stained” withfluorescently labeled or biotinylated ligand (the latter combined withFITC avidin) and sorted using a flow cytometer (MoFlo HTS Cytometer,Becton Dickinson FACS) into wells of a plate, a tube, etc. The cells maythen be grown using standard cell culture techniques. According to afirst non-limiting example, the gene encoding the drug's receptor maythen be cloned by plasmid recovery from COS 1 cells by using the effectof the large T antigen effect on the SV40 origin of replication.According to a second non-limiting example, PCR may be used to recoverthe plasmid insert.

In a second embodiment of the invention, cells, viral particles orpeptide-nucleotide fusions may be selected using drug coated magneticbeads, a drug coated surface (e.g., a well for panning) or a drug coatedcolumn. A high density of drug ligands on the surface, beads or columnis desirable to increase the avidity of low affinity interactions. Thedrug may be attached to the surface, beads or column via an affinitylabel (e.g., avidin, digoxygenin) and elution may be achieved after oneor more washing steps. In the case of magnetic beads, magnets may thenbe used to isolate beads during the wash to recover bound cells, viralparticles or peptide-nucleotide fusions. In the case of panning, thesupernatant is poured off after each successive washing step with thecells, viral particles or peptide-nucleotide fusions retained in thewells. Elution from a column may be achieved by standard techniques. Inthe case where the ligands were derivatized with an affinity label,cells, viral particles or peptide-nucleotide fusions may be eluted fromthe column by applying excess free affinity label to the column.

Once separated, target expressing cells or viral particles can be grownas appropriate. Then the cDNA encoding the target may be recovered bystandard molecular biology techniques (e.g., plasmid recovery or PCR).In the case of purified peptide-nucleotide complexes, the partial cDNAsequence would be identified using RT PCR. Using the above approach thetarget can be purified and cloned using one or more rounds of selection.In this way, the DNA sequence encoding a previously unknown drug targetcan be isolated and used to clone the cDNA encoding the drug target.

Once the cDNA encoding the drug target has been identified, the cDNA canbe used to study differential expression in cells from disease tissuesas in section 6.1. If the target is differentially expressed betweendisease and normal cells, specificity is established and the ligandsinteracting with that target may be tested in vitro and in vivobioassays for that disease.

Thus the target associated with a function in the phenotypic assay isidentified employing the invention.

7.2. Target Identification by Proteomics

Target identification may also be achieved by adapting the method setforth in section 6.1.2. to combine the ligand of interest with one aplurality of potential targets, collecting ligand-target pairs, andoptionally dissociating the ligand and target. Subsequently, the targetmay be identified. In one embodiment of the invention, the target is aprotein which may be identified by common techniques (e.g., amino acidsequencing, mass spectroscopy and/or NMR). Once the protein has beenidentified, its association with diseased cells may be determined usingstandard proteomics techniques.

8.1. Mapping Signaling Pathways

Once a number of genes have been shown to be involved in a particularmolecular pathway of disease pathogenesis, a targeted component can bemapped within the molecular pathway relative to other molecular pathwaycomponents. Ligands which bind to different molecular pathway componentsmay be derivatized with photoactivatable crosslinkers. At least one ofthe known molecular pathway components is fused with a marker such asGFP. Then the following may be combined in vivo or in vitro: (i) aderivatized ligand which binds the known molecular pathway component,(ii) the marked pathway component, e.g., GFP fusion protein, (iii) atleast one derivatized ligand which binds or may bind another molecularpathway component, and (iv) other molecular pathway components. Thecrosslinking stimulus is applied and each component of the resultingcomplex is identified. In this way each molecular pathway components maybe mapped relative to other components with which it interacts. Afurther advantage of the invention is that pathway effectors may beidentified by this method. In addition, the profile of each pathwaycomponent may be compared with known drugs acting via that pathway, ifany, and comparative studies can be done in cell based assays ofdifferent diseases caused by that pathogenic pathway. This informationcan be used to validate and select the best target for a given diseaseindication. As an alternative, this information may be used to selectthe best therapies for a particular patient using pharmacogenetics.

9.1. Lead Optimization

Because a large number of chemical leads may be characterized at thebiochemical and phenotypic levels, a structure activity relationship(SAR) may be established to serve as a basis for lead optimization. If afew molecules with similar activities are identified, the SAR can bedetermined by comparing their structures with activity in the assays.The target directed synthesis technology can be employed to crosslink orreact molecules binding close to each other indicating if their activityis mediated through the same active subsite on the protein or throughdifferent subsites on the protein target. In this way additionaldifferent functional subsites on the target can be mapped and differentmechanisms can be interpreted from the phenotypic findings withmolecules binding to those subsites (e.g., agonist vs. antagonist).

The second use of target directed synthesis is to increase the affinityof a ligand for its target and thus make the ligand more useful to linkphenotype to genotype as well as making a better drug lead.Photoactivatable crosslinkers on one of the functional groups of theligand scaffold may be used to link ligands bound to the target thususing the target molecule as a template. This linking should increasethe affinity of binding to the target by at least 2- to 10-fold andfurther enhance the structural diversity of the library in a targetdirected and biologically relevant way.

10. IN SILICA APPROACH TO LINKING PHENOTYPE WITH GENOTYPE

The instant invention provides a method to establish a chemicalfingerprint of ligand-target (genotype) and ligand-bioassay (phenotype)for each ligand or set of ligands which can be matched in silica toassociate phenotype with genotype.

The present invention provides a first information retrieval systemwherein ligand-target pairing experimental data will be stored. Thepresent invention provides a second information retrieval system whereinthe effects of each ligand in each bioassay tested will be stored. Thepresent invention provides a third information retrieval system whereinthe function and/or the expression pattern of each target, if known,will be stored. These systems may be optionally integrated to facilitateuse.

In one embodiment of the invention, data entered into the systems may beobtained by a shotgun approach wherein all targets are tested forbinding to ligands or all ligands are tested in each bioassay. Forexample, the set of targets may encompass up to all expression productsof up to and including all genes in the genome of a selected organism.Each target is then used to screen a library of ligands to identifyligands which bind. This data is entered into the first informationretrieval system.

According to another example, the effect of each member of a largecombinatorial chemical library of ligands may be assayed in eachavailable bioassay. This data is entered into the second informationretrieval system.

In another embodiment of the invention, data entered into the system isobtained by a focused analysis of ligands which bind selected targets ina specific disease or the phenotype induced by selected ligands inselected bioassays. This data is entered into the first or secondinformation retrieval system as appropriate.

These systems may then be used to guide the user in predicting targetfunction even in the absence of differential expression data or aparticular disease focus. In addition, these systems may guide the userin selecting ligands and targets with specific effects. A furtheradvantage is that this system may reduce the number of bindingexperiments and bioassays necessary. Other advantages will be apparentto one skilled in the art.

In one embodiment of the invention, a user selects a target of interest.Next, the user identifies ligand(s) which bind the target of interesteither experimentally or from the first information retrieval system.The user then queries the second information retrieval system with theidentified ligand(s) to determine the phenotype(s) associated with eachligand. In this way, a target may be associated with one or morephenotypes.

In another embodiment of the invention, a user selects a phenotype ofinterest. Next, the user identifies ligand(s) which modulate theselected phenotype either experimentally or from the second informationretrieval system. The user then queries the first information retrievalsystem with the identified ligand(s) to identify target(s) to which theligand(s) binds. In this way, a phenotype may be associated with one ormore targets.

In another embodiment of the invention, these information retrievalsystems may be combined with target functional information and/orexpression analysis data to guide the user in validating targets anddrug leads. In a first example of this embodiment, a user may choosetargets X and Y which are proteins. The user obtains expression datawhich indicates that the gene encoding X is expressed in normal cellsbut is not expressed in tumor cells. The user obtains further expressiondata which indicates that the gene encoding Y is not expressed in normalcells but is expressed in tumor cells. The user then queries the firstinformation retrieval system. The results of this query are shown inTable 2. TABLE 2 Ligands that Target Bind X 1 X 2 X 3 Y 2 Y 3 Y 4

The user then queries the second information retrieval system. Theresults of this query are shown in Table 3. TABLE 3 Ligands Phenotype 1,2, 3 Angiogenesis 2, 3, 4 Proliferation

According to this example, the user may select target Y as a validtarget for cancer therapy and may select ligand 4 for its ability tospecifically bind Y and not X. Thus, the invention is able to guide theuser in validating targets and identifying drug leads.

In a second example of this embodiment, the phenotype to genotypeapproach has been used to determine that ligands 1, 2, and 3 induceapoptosis in a bioassay; ligands 3, 4, and 5 stimulate angiogenesis; andligands 1, 3, and 6 induce necrosis. This information is stored in aninformation retrieval system. In a high throughput binding assay, it isdiscovered that ligands 3 and 4 bind to target X with K_(d)<50 μM. Asearch of the information retrieval system will indicate to one skilledin the art that (i) target X may be involved in angiogenesis, (ii)ligand 3 is a poor candidate for a drug lead, and (iii) ligand 4 may bea good candidate for a drug lead.

11. AUTOMATION OF THE METHODS OF THE INVENTION

A highly automated approach such as those shown diagramatically in FIGS.18 and 19 is another embodiment of the present invention. This includeshigh throughput expression vector construction, protein production, andpurification facility capable of producing >20 proteins a week insufficient amounts to determine ligands from a compound library. This isfollowed by the use of a high throughput assay such as the ChemicalArray Assay to identify scaffold target pairs. These scaffold targetpairs comprise the chemical array database which has the uses outlinedin FIG. 17.

For high throughput expression vector construction, a cDNA encoding oneof the proteins in the human proteome from, for example, NCBI,Stratagene, or Incyte is inserted into a DES expression vector(Invitrogen) using an automated fluid handling system (Tecan) in a 96well format. The DES expression vector adds a secretion signal and ahis-tag to the encoded protein so that it is secreted into the media andcan be purified using a nickel column that binds the his-tag.

In an exemplary method, the sequence of a cDNA of interest is verifiedby DNA sequencing, and the 5′ end of the cDNA is PCR tagged with a4-mer. The cDNA is Topoisomerase (TOPO) cloned into pMT/BiPN5-His A, B,or C (Invitrogen) depending on reading frame for expression in insectcells or into pcDNA3.1DV5His TOPO (Invitrogen) for expression in 293cells using standard methods (FIG. 20). To assure that the resultingprotein will be secreted from the 293 cells, the cDNA may be analyzed bysequence homology to determine if a secretory leader is present and atransmembrane domain is not present. A secretory leader (e.g., Ig Kchain leader or CD59 leader) may be added to the 5′ end of the cDNA, andthe transmembrane domain may be deleted from the cDNA using standardmolecular biology techniques. This method is particularly useful ifthere is a single transmembrane domain. If there are multipletransmembrane domains or one wants to use a form of the protein whichcan be integrated into micelles or a membrane, one can produce theprotein as a membrane protein (Section 11.1).

The vectors are then transfected into competent E. coli cells, and thecells (e.g., 2 colonies per vector) are propagated (e.g., in anovernight culture). The expression vector can be extracted from the E.coli cells using a robotic fluid handler to add a standard lysis reagentto lyse the cells and to apply the lysate to Qiagen columns to purifythe expression vector. In a particular embodiment, the lysate ispurified using the QIAwell 96 Ultra Plasmid Kit which uses a Qiafilter96 well plate for lysate clearing, QIAwell 96 well plates forpurification of the plasmid DNA, and QIAprep 96 well plates fordesalting each plate sequentially on the QIAvac 96 automated vacuumdevice. If desired, cells containing the expression vector with the cDNAinsert in the proper reading frame are selected using standard methods.For example, the expression vector can be restriction enzyme digested orsequenced to determine whether it contains the cDNA insert in-frame.

The expression vector containing the insert is then transfected withCellfectin into Drosophila S2 cells (Invitrogen) using standard calciumphosphate transfection methods and grown in Drosophila expression media(Invitrogen) in 5-12 flasks per vector in the SelecT automated tissueculture system (Automation Partnership) (FIG. 21). Each SelecT systemcan handle up to 150 flasks or up to 40 separate cell lines expressingdifferent proteins, and using multiple SelecT's in parallel can increasethroughput to 600 proteins per week. In one possible method, coppersulfate is added to the medium after 24 hours to induce proteinexpression and on day 3 and 7 the supernatant is collected for proteinpurification. In other methods, transient expression is induced on day3, and the supernatant is harvested on days 4 and 6. Using 20 Select Trobots that each handles 150 T175 flasks with 150 mL media andDrosophila cells that express approximately 10 mg/L, every two flasksmay produce 2 to 4 mg of protein with one harvest. To increase theamount of protein, the supernatant can be harvested additional times,such as 1, 2, 3, or more addition times. If five flask are used perprotein, each Select T system produces 30 proteins. Thus 2 to 4 mg ofprotein can be produced for 600 proteins (i.e., 30 proteins for each ofthe 20 Select T robots) per week.

The supernatant is passed through the nickel column in 96 well format(Qiagen QIAexpress protein purification system or Qiagen nickel affinitymagnetic plates) on a Biorobot (Qiagen). A Tecan fluid handler thentransfers an aliquot of this protein to PHAST gel (Pharmacia) for SDSanalysis, bioassays, or other quality control analysis (Qc). The rest ofthe sample is transferred by the reagent storage retrieval system(Haystack) to the Chemical Array Assay (e.g., in any of the assaymethods described herein) and to the freezer for storage. For example, arobotic fluid handler (Tecan) can be used to combine the purifiedprotein target with a library of candidate ligands to allow one or moreof the candidate ligands to bind the target protein in the wells of a 96well plate. This 96 well plate can than be transferred to an HPLC(Waters 2790) which can inject the assay mixture containing the targetprotein and candidate ligands from 96 well plates and run up to 6columns in parallel for the isolation of the target protein with boundligands. The fraction containing the target with bound ligand can becollected using a fraction collector (Gilson). In an alternativeembodiment, a robotic fluid handler (Tecan) is used to combine thepurified protein target with a library of candidate ligands to allow oneor more of the candidate ligands to bind the target protein in the wellsof a 96 well plate. This 96 well plate contains, for example, cartridgeswith a resin capable of separating target proteins from unbound ligandsto isolate the target protein with bound ligands into a second 96 wellplate upon evacuation by a robot (Tecan or Qiagen). In an alternativeembodiment, the binding occurs in a 96 well plate, and then a fluidhandler (Tecan) transfers the sample to a second 96 well plate includingthe cartridges for separation. In still another embodiment, thecartridges are spin columns which are available in multiwell formats(Pharmacia). Chip based and capillary LC based separations can also beused. A detergent or other denaturant can be added by the fluid handler(Tecan) to release the bound ligands from the protein, and then thereleased ligands are added to an appropriate instrument for analysis.For example, the ligands can be injected into a mass spectrometer usinga reverse phase column on an HPLC containing an autoinjector (Waters),spotted on a filter for MADLITOF mass spectrometry analysis, or appliedto an NMR, IR, FTIR, or UV spectrometer. In an alternative embodiment,the target protein with bound ligands is loaded or spotted onto the 96well format MALDITOF (Bruker Daltonics) using a fluid handler (Tecan).In another alternative embodiment, the target protein with bound ligandsis evacuated onto a filter (for example, nitrocellulose) in a 96 wellformat by evacuation with a robot (Tecan). In another embodiment, theevacuation onto this same filter is performed in the same step as the asthe evacuation of the 96 well cartridges by placing the filter betweenthe cartridges and the vacuum device. The MALDITOF then dissociates thetarget protein and ligands from each of the 96 spots and generates amass spectrum for the compound and/or complex. After data processing bythe information systems described herein, the identity of the ligand andits target are entered into the Chemical Array Database. Any of thesemethods can be performed in 384, 1536 well, chip based, or otherformats. Similarly, any of the data can be entered and managed using alaboratory information management system (LIMS) based on IDBS ActivityBase or Price Waterhouse, or other LIMS software/systems.

Similar methods can be applied for other transient expression basedproduction systems including, but not limited to, HEK293 cells, CHO, orCOS cells. Alternatively, other automated or semi-automated productionsystems can be used, such as roller bottle systems, Stir tank systems(e.g., Celligen Plus from New Brunswick), or capillary cell culturesystems (Amicon). In another embodiment, a semiautomated process, suchas a 1 L or larger bioreactor from New Brunswick, is used to grow cellssuch as HEK293 cells (Life Technologies) transiently transfected withexpression constructs constructed as described above based upon thepCDNA family of vectors (Invitrogen). Transiently transfected CHO cellscan also be used. The transfection in these cell types can beefficiently achieved using Lipofectamine 2000 (Life Technologies). Inalternative embodiments, other transfection strategies are used (forexample, electroporation, Calcium Phosphate, Lipofectin, LipofectaminePlus (Life Technologies), or other standard techniques). These cells aregrown in DMEM or in other standard mediums with serum or in serum freeforms using standard methods. In addition, alternative expressionvectors, such as those appropriate for the various cell lines mentionedas indicated in the catalogue of Invitrogen, other vector companies, thescientific literature, or those which would be apparent to those skilledin the art.

In an exemplary method for the semi-automated transfection, proteinproduction, and purification of 600 proteins per week (2 to 4 mg each),CHO cells or HEK 293 cells are used. In particular, CHO cells (e.g.,CHO-F line stably transfected with T antigen) or 293 cells are grown insuspension culture to a volume of 1.4 L in a 2.2 L bioreactor (NewBrunswick) or bag (Wave System) or a large vessel (e.g., 5.5 L or 10.5 Lvessels). The cells are allowed to settle or are pelleted bycentrifugation. Alternatively, the HEK 293 or CHO cells are grown asconfluent cells (e.g., grown using Semi automated Cell Mate) andLipofectamine 2000 is used as the transfection agent. The media istemporarily removed, and the cells are transfected with the expressionconstruct and D MRIE-C reagent in a 60 mL volume using standard methods,such as Invitrogen's protocol. The media is added back to the bioreactoror bag, and the cells are cultured. After two to three days, thesupernatant is harvested. The protein is analyzed and purified asdescribed above for the protein production methods using Drosophilacells. For large scale protein production, 150 BioFlow 110 BioreactorSystems with 4 vessels per system (New Brunswick) can be used. Becausemammalian cells produce less protein (approximately 1 mg/L) than insectcells (approximately 1 mg/L), 2 to 4 L of culture are used to produce 2to 4 mg of protein.

If desired, a clone selection step can be performed, resulting in stableproducer cell line based production systems (e.g., CHO or E. coli basedsystems). Exemplary clone selection steps include growing the cells inthe presence of an selective antibiotic, e.g., Geneticin, in amulti-well format to select cells likely to contain the expressionvector, and then checking each well for the presence of the secretedprotein using a standard ELISA assay or other standard assay to detectthe his-tag present in the protein.

In addition, high throughput production and screening techniques can beused for any of the methods of the invention. For example, any bindingassay (chip, filter, radiolabelled, flourescent, surface plasmonresonance, etc.), production method (e.g., mammalian cells such as CHO,HEK 293, Cos; insect cells such as Drosophila, bacteria such as E. coli,or yeast such as pichia), production systems (e.g., bioreactors (NewBrunswick systems by Brandel, flask based, cell cube, surface bound,suspension cultures, serum containing media, or serum free media), andany purification method (HIS tag/nickel column, GST/glutathione, intein,or other affinity column) can be used. Any of these automated and/orhigh throughput methods can be performed with multiple systems acting inparallel, such as multiple robotic systems (such as multiple SelecTrobots from Automation Partnership). For example, 2, 2, 4, 5, 6, 8, 10,10², 10³, 10⁴, 10⁵, 10⁶, or more targets can be assayed in parallel toselect ligands that bind the targets. Similarly, 2, 5, 10, 10², 10³,10⁴, 10⁵, 10⁶, 10⁷, 10⁸, or 10⁹ or more small molecules of interest canbe assayed in parallel to select target molecules that bind the smallmolecules. Because columns with GFF resin can be regenerated in onlyseven minutes, multiple assays can also be performed sequentially usingthe same column with little down time between assay. Additionally, theassay can be automated by sequentialy injecting columns in an HPLC (FIG.26).

11.1 High-Throughput Production of Membrane Proteins

For the production of membrane proteins, expression constructs such aspMTN5 His-TOPO for expression in Drosophila cells orpcDNA3.1D/V5-His-TOPO for expression in CHO or 293 Cells can be usedwithout a secretory leader sequence but must at least have a membraneleader sequence. Though it is unlikely to be necessary for a membraneprotein since the cDNA encodes at least one transmembrane domain, anexogenous transmenbrane domain (e.g., PDGFR transmembrane domain) may beoptionally added at the 3′ end of the cDNA to assure insertion into themembrane. This transmembrane domain may be especially useful in the casethat the cDNA is not full length A cleavage site is inserted between the3′ end of a cDNA encoding a membrane protein of interest and V5-His(e.g., a thrombin, Tobacco Etch Virus, or intein-based self-cleavingsite). This can also be done for the secreted proteins above. TheDrosophila, CHO, or 293 cells are transfected and cultured as describedabove for secreted proteins. The cells are pelleted and homogenized inTween 20 (0.05%) containing Lysis Buffer. The mixture is then cleared bycentrifugation and purified using a nickel affinity column as described.The V5-His tag is removed by cleavage, and the protein is integratedinto micelles. For example, the protein can be dissolved in methanol andmixed with Dodecylphosphocholine (Avanti) in methanol. The methanol isevaporated, and the mixture is dissolved in aqueous buffer withoutdetergent. The protein is then analyzed and used in the binding assaysof the present invention as described above. The methods described byLahiri et al. (J. Amer. Chem. Society 118, 2347-2358, 1996) can also beused to assay the binding of ligands to micelles containing thesemembrane proteins.

11.2 High-Throughput Production of Linear Expression Constructs

Linear expression constructs may be used instead of circular vectors forthe expression of proteins of interest. In contrast to circular vectorswhich are often amplified by being transfected into bacteria,replicated, and purified; linear expression constructs can be PCRamplified and directly transfected into the cells used for proteinexpression (e.g., Drosophila cells). As illustrated in FIG. 22, thelinear expression constructs are generated by reacting a topoisomeraselabeled 5′ nucleic acid containing a promoter and an optional secretoryor leader sequence, a nucleic acid (e.g., a cDNA) encoding a protein ofinterest, and a 3′ nucleic acid containing a sequence encoding anaffinity tag (e.g., a hexahistidine tag) and a polyA tail. For amembrane protein, a sequence encoding the PDGFR transmembrane domain maybe inserted upstream of the sequence encoding the affinity tag, or thisdomain may alternatively be present in the cDNA. Preferably, the 5′component contains a 5′ primer for PCR amplification after the cDNA isinserted; a promoter compatible with the cell type used for expression;an optional leader sequence to target a protein to be secreted, aninternal protein, or a membrane protein; and a TOPO sequence.Preferably, the 3′ component contains a 3′ TOPO sequence, a His tagcoding sequence or another sequence encoding an affinity tag forstandardized purification, a Poly A sequence, and a primer for PCRamplification after cDNA insertion. For expression of two genes, a thirdcomponent is also used that preferably contains a first 3′ TOPOsequence, a His tag coding sequence or other sequence to facilitateprotein purification, a polyA sequence, a spacer, and a promoter for thecell type to be used for expression, an optional leader, and a TOPOsequence. Examples of the components of the 5′ and 3′ ends of theselinear constructs are listed in Table 4. TABLE 4 Construction oftopoisomerase linear expression constructs Expression system 5′ end 3′end Drosphila Secreted pMT/BiP His/PolyA Drospophila Membrane pMTHis/PolyA 293/CHO Secreted CMV/Leader His/PolyA/SV40ori 293/Cho MembraneCMV His/PolyA/SV40ori

For the generation of these linear expression constructs, a polylinkercontaining restriction enzyme sites such as EcoRI can be used. Thepolylinker may contain any number of restriction enzyme sites including,but not limited to, EcoRI, BamHi, XbaI, SalI, HindIII, PvuII, XhoI,EcoRV, SacI, and BglII. Alternatively, the construct can be made withoutthe polylinker (e.g., made with just one restriction enzyme site). Inaddition, the SV40 promoter, RSV promoter, EF-1α promoter, ubiquitinpromoter, or any other promoter can be substituted for CMV. Similarlydual gene expression constructs can be constructed with expressioncassettes containing two promoters (e.g., CMV and EF-1α). Promoters andleaders may be selected to enable constitutive, inducible, transient,stable, surface, secreted, or internally targeted expression. The SV40origin sequence may be included to allow amplification in the presenceof SV40 T antigen expressed in the cell lines. Other origins including,but not limited, to the EBV oriP may alternatively be used. Theseconstructs may be produced using standard molecular biology techniqueseither as a linear element or as part of a plasmid followed by releaseby restriction enzyme digestion or by PCR amplification. Each of theelements may be synthesized as an oligomer for elements less than 100nucleotides in length, isolated by restriction digestion, PCRamplification, or other techniques from a plasmid (e.g., including, butnot limited to, PMT/BiP/V5-His A, B, or C, or pCDNA3.1, In vitrogen) andsequentially linked as individual components or groups using standardmolecular biology techniques. In the case of PCR amplification of the 5′element from a plasmid, a primer upstream of the promoter and a secondprimer (e.g., preferably including a CCCTT sequence for adaptation withtopoisomerase and a GTGG or other sequence for directional cloning, seebelow) downstream of the promoter and the leader may be used. In thecase of PCR amplification of the 3′ element, a primer upstream of theV5-His or at least the polyA (e.g., preferably including the CCCTTsequence for adaptation with Topoisomerase) and a second primerdownstream of the polyA signal or the Ori, may be used. Alternativeconstruction methods known to those skilled in the art may also beemployed.

Once these constructs are made, the EcoRI site is cleaved, and the 3′strands of DNA at both the 5′ and the 3 end are PCR extended with theCCCTT sequence. Alternatively, an oligo containing the CCCTT sequencemay be inserted and cleaved using standard molecular biology techniques.Other slight modifications of these sequences may alternatively be usedincluding an A or a T. These 3′ strands are then adapted withtopoisomerase (TOPO; Vaccinia Topisomerase I—Sigma) to produce acovalent DNA (3′ phosphotyrosyl) protein adduct between tyrosine 274 oftopoisomerase I and the 3′ T in the DNA sequence. This reaction can beperformed by mixing pmole levels of DNA containing the 3′ CCCTTtopoisomerase sequence and topoisomerase at a 5 fold excess oftopoisomerase in 50 mM Tris at pH 8 (e.g., 0.2 pmole duplex DNA to 1pmole topoisomerase) using the methods of Sekiguchi et al. (J. Biol.Chem. 272: 15721-15728, 1997). The 5′ and 3′ ends can be modified inthis fashion in their linear form or attached to a plasmid with arestriction site which allows their release from the plasmid after theyhave been labeled with topoisomerase.

These constructs are made in all three reading frames by modifying thePCR amplified element by adding either a single N or a double N on the5′ strand upstream of the CCCTT topoisomerase sequence. To perform theligation, each cDNA is PCR amplified to contain a 5′ A on each strandwhich is complimentary to the 3′ T in the topoisomerase sequence andmixed with the linear TOPO reagents. For a directed ligation in the 5′and 3′ orientation, the cDNA is PCR amplified using a primer at the 5′end with CACC, and the 5′ end of the vector is modified with GTGG at theend of the 5′ and 3′ strand by PCR amplification prior to TOPO labeling.A blunt end or an end containing other sequences to achieve directedligation may also be used. In this case, the 3′ end is either (i) bluntended on both the cDNA and the 3′ end expression construct by using aproofreading polymerase or (ii) they are as above. The ligation may beperformed with high fidelity polymerase (0.5 U Pst). The whole constructis then PCR amplified using the two primers on the 5′ and 3′ ends whichrapidly results in linear DNA for transfection into cell lines and doesnot require bacterial growth. Thus, this method is easily automated. Thelinear DNA typically integrates into chromosomal DNA and is expressed bythe transfected cell. Optionally, the PCR primer distal ends may beligated into circular form to facilitate Origin based (e.g., SV40 oranother ori) amplification after transfection into a cell lineexpressing the transactivator (e.g., T antigen in the case of SV40 ori).These constructs can be used for transient or stable transfection.

Transfection of the CHO-F line (Life Technologies) with a plasmidexpressing the SV40 T antigen adapts these cell lines, which are theclassic mammalian cell lines for stable protein production, into a cellline appropriate for high level transient expression with SV40 based orCMV based promoters. Alternatively, 293 cells can be transfected withlarge T if it is not already expressed. Alternative amplificationsystems can also be used including transfecting CHO, 293, or anther cellline with other viral proteins such as EBNA 1 from Epstein-Barr Virusfor plasmids or linear expression elements containing EBV Ori-P. Thecell lines may also be transfected with genes encoding enzymes involvedin posttranslational modification, including, but not limited to, thoseinvolved in glycosylation (e.g, such as fucosyl transferase 7). Suchcell lines produce targets with alternative posttranslationalmodifications which may be in a specific cell type relevant to thepathology/physiology or pathology. Other examples of cells that can betransfected with a linear construct of the invention include bacteriasuch as E. coli, insect cells such as a Drosophila cells, or mammaliancells such as a Cos, HEK293, or CHO cells.

Other Embodiments

From the foregoing description, it will be apparent that variations andmodifications may be made to the invention described herein to adopt itto various usages and conditions. Such embodiments are also within thescope of the following claims.

Various publications and patent applications are cited herein, thecontents of which are hereby incorporated by reference in theirentireties to the same extent as if each independent publication orpatent application was specifically and individually indicated to beincorporated by reference.

1. A method for selecting a candidate ligand which binds a targetmolecule, said method comprising: (a) contacting an in vitro samplecomprising a target molecule with a library of candidate ligands underconditions that allow complex formation between said target molecule andone or more said candidate ligands, wherein said library comprises atleast two different chemical scaffolds or comprises at least 11different compounds; (b) isolating said complex; (c) recovering one ormore said candidate ligands from said complex; and (d) identifying oneor more recovered candidate ligands.
 2. The method of claim 1, whereinstep (d) comprises determining the MS, IR, FTIR, NMR, and/or UV spectrumof said recovered candidate ligand.
 3. The method of claim 1, wherein atleast 100 different candidate ligands are simultaneously contacted withsaid target molecule.
 4. A method for selecting a candidate ligand whichbinds a target molecule, said method comprising: (a) contacting an invitro sample comprising a target molecule with a library of candidateligands under conditions that allow complex formation between saidtarget molecule and one or more said candidate ligands; (b) isolatingsaid complex; (c) recovering one or more said candidate ligands fromsaid complex; and (d) determining the mass to charge ratio of an isotopeor fragment peak in the mass spectrum of a recovered candidate ligand,thereby identifying said recovered candidate ligand.
 5. The method ofclaim 4, wherein at least 100 different candidate ligands aresimultaneously contacted with said target molecule.
 6. The method ofclaim 4, wherein step (d) further comprises determining the mass tocharge ratio of the parent peak in the mass spectrum of said recoveredcandidate ligand.
 7. A method for selecting a candidate ligand whichbinds a target molecule, said method comprising: (a) contacting an invitro sample comprising a target molecule of unknown biological functionwith a library of candidate ligands under conditions that allow complexformation between said target molecule and one or more said candidateligands; (b) isolating said complex; (c) recovering one or more saidcandidate ligands from said complex; and (d) determining the MS, IR,FTIR, NMR, and/or UV spectrum of a recovered candidate ligand, therebyidentifying said recovered candidate ligand.
 8. The method of claim 7,wherein at least 100 different candidate ligands are simultaneouslycontacted with said target molecule.
 9. A method for selecting acandidate ligand which binds a target molecule, said method comprising:(a) contacting an in vitro sample comprising a target molecule with oneor more candidate ligands under conditions that allow complex formationbetween said target molecule and one or more said candidate ligands; (b)isolating said complex; (c) recovering one or more said candidateligands from said complex; and (d) determining the IR, FTIR, NMR, and/orUV spectrum of a recovered candidate ligand, thereby identifying saidrecovered candidate ligand.
 10. The method of claim 9, wherein at least100 different candidate ligands are simultaneously contacted with saidtarget molecule.
 11. A method for selecting a candidate ligand whichbinds a target molecule, said method comprising: (a) contacting an invitro sample comprising a first target molecule and a second targetmolecule with a library of candidate ligands under conditions that allowcomplex formation between said first target molecule and one or moresaid candidate ligands and allow complex formation between said secondtarget molecule and one or more said candidate ligands; (b) isolating afirst complex comprising said first target molecule bound to a candidateligand and isolating a second complex comprising said second targetmolecule bound to a candidate ligand; (c) recovering one or more saidcandidate ligands from said first complex and/or from said secondcomplex; and (d) identifying one or more recovered candidate ligands.12. The method of claim 11, further comprising contacting said samplewith a competitor ligand known to bind said target molecule, said firsttarget molecule, or said second target molecule.
 13. A method fordetermining the biological function of a target molecule, said methodcomprising: (a) contacting an in vitro sample comprising a targetmolecule of unknown biological function with a library of candidateligands under conditions that allow one or more said candidate ligandsto bind said target molecule; (b) selecting a candidate ligand whichbinds said target molecule; and (c) measuring the effect of saidselected candidate ligand in a biological assay, thereby determining thebiological function of said target molecule.
 14. The method of claim 13,further comprising identifying said selected candidate ligand.
 15. Amethod for determining the biological function of a target molecule,said method comprising: (a) contacting an in vitro sample comprising atarget molecule that is upregulated or downregulated in a disease state,in the presence of a physiological stimulus, or during a specificcellular or biological process with a library of candidate ligands underconditions that allow one or more said candidate ligands to bind saidtarget molecule; (b) selecting a candidate ligand which binds saidtarget molecule; and (c) measuring the effect of said selected candidateligand in a biological assay, thereby determining the biologicalfunction of said target molecule.
 16. The method of claim 15, furthercomprising identifying said selected candidate ligand.
 17. The method ofclaim 15, wherein said selected candidate ligand increases the activityof said target molecule in said biological assay.
 18. The method ofclaim 15, wherein said selected candidate ligand decreases the activityof said target molecule in said biological assay.
 19. A method fordetermining the biological function of a target molecule, said methodcomprising: (a) contacting an in vitro sample comprising a targetmolecule with a library of candidate ligands under conditions that allowone or more said candidate ligands to bind said target molecule; (b)selecting a candidate ligand which binds said target molecule; and (c)measuring the effect of said selected candidate ligand on a tissue froma organism having a disease or disorder or undergoing a specificcellular or biological process in the presence or absence of aphysiological stimulus, thereby determining the biological function ofsaid target molecule.
 20. The method of claim 19, wherein said tissue ishuman tissue.
 21. A method for reacting two or more ligands that bind atarget molecule of interest, said method comprising contacting a cell orin vitro sample comprising a target molecule of unknown secondary ortertiary structure with a first ligand comprising a first crosslinkerand with a second ligand under conditions that allow said targetmolecule to bind said first ligand and said second ligand and allow saidfirst crosslinker to covalently bind said second ligand, therebygenerating a crosslinked product comprising said first ligand and saidsecond ligand.
 22. A method for reacting two or more ligands that bind atarget molecule of interest, said method comprising contacting a cell orin vitro sample comprising a target molecule with a first ligandcomprising a first crosslinker and with a second ligand, wherein thelocation or the tertiary structure of the binding site in said targetmolecule for said first ligand or said second ligand is unknown, andwherein said contacting is conducted under conditions that allow saidtarget molecule to bind said first ligand and said second ligand andallow said first crosslinker to covalently bind said second ligand,thereby generating a crosslinked product comprising said first ligandand said second ligand.
 23. A method for reacting two or more ligandsthat bind a target molecule of interest, said method comprisingcontacting a cell or in vitro sample comprising a target molecule with afirst ligand and with a second ligand, wherein said contacting isconducted under conditions that allow said target molecule to bind saidfirst ligand and said second ligand and allow said first ligand tocovalently bind said second ligand, thereby generating a productcomprising said first ligand and said second ligand that has an affinityfor said target molecule that is greater than the affinity of said firstligand or said second ligand for said target molecule.
 24. A method forreacting two ligands that bind different target molecules, said methodcomprising contacting a cell or in vitro sample comprising a firsttarget molecule and a second target molecule with a first ligand andwith a second ligand, wherein said contacting is conducted underconditions that allow (i) said first protein to bind said first ligand,(ii) said second protein to bind said second ligand, and (iii) saidfirst ligand to covalently bind said second ligand, thereby generating aproduct comprising said first ligand and said second ligand; and whereinthe location or the tertiary structure of the binding site in said firsttarget molecule for said first ligand and/or the location or thetertiary structure of the binding site in said second target moleculefor said second ligand is unknown.
 25. The method of claim 24, whereinthe generation of said product indicates that said first protein andsaid second protein interact in vivo.
 26. A method for isolating asecond protein which binds a first protein, said method comprising: (a)contacting a cell or an in vitro sample comprising a first protein and asecond protein with a first ligand comprising a first ligand and with asecond ligand under conditions that allow (i) said first protein to bindsaid first ligand, (ii) said second protein to bind said second ligand,and (iii) said first ligand to covalently bind said second ligand,thereby generating a product comprising said first ligand and saidsecond ligand and generating a complex comprising said product, saidfirst protein, and said second protein; (b) isolating said complex; and(c) identifying said first protein and/or said second protein in saidcomplex or recovered from said complex.
 27. The method of claim 26,wherein said first protein comprises a detectable group.
 28. The methodof claim 26, wherein said second ligand comprises a crosslinker.
 29. Themethod of claim 26, wherein the generation of said product indicatesthat said first protein and said second protein interact in vivo. 30.The method of claim 26, wherein the affinity of said product for saidtarget molecule is greater than the affinity of said first ligand orsaid second ligand for said target molecule.
 31. The method of claim 26,wherein said product is used in drug discovery or development or leadoptimization.
 32. The method of claim 26, wherein said product is usedin the development of an agricultural or environmental agent.
 33. Amethod for selecting a candidate target molecule which binds a smallmolecule of interest, said method comprising: (a) contacting an in vitrosample comprising a small molecule of interest having a moiety otherthan an amino acid or having a molecular weight less than 4000 daltonswith a library of candidate target molecules under conditions that allowcomplex formation between said small molecule of interest and one ormore said candidate target molecules; wherein said target molecules arenot expressed on the surface of phage; (b) isolating said complex; and(c) recovering one or more said candidate target molecules from saidcomplex, thereby selecting one or more candidate target molecules whichbind said small molecule of interest.
 34. The method of claim 33,wherein, prior to step (a), said small molecule of interest is selectedfrom a library of small molecules based on its effect in a biologicalassay.
 35. A method for selecting a target protein which binds a smallmolecule of interest, said method comprising: (a) expressing in apopulation of cells a protein fusion comprising a target proteincovalently linked to surface protein, said expression being carried outunder conditions that allow the display of said protein fusion on thesurface of said cells; (b) contacting said cells with a small moleculeof interest having a moiety other than an amino acid or having amolecular weight less than 4000 daltons; and (c) selecting said cellswhich bind said small molecule of interest, thereby selecting saidtarget proteins which bind said small molecule of interest.
 36. Themethod of claim 35, wherein said cell is a mammalian, bacterial, yeast,or insect cell.
 37. A method for selecting a target protein which bindsa small molecule of interest, said method comprising: (a) expressing ina population of cells a protein fusion comprising a target proteincovalently linked to surface protein, said expression being carried outunder conditions that allow the display of said protein fusion on thesurface of viruses released from said cells infected with said virus;(b) contacting said viruses with a small molecule of interest, whereinsaid small molecule of interest (i) is a nucleic acid, (ii) is acarbohydrate, (iii) is a lipid (iv) has a moiety other than an aminoacid, (v) has a molecular weight less than 750 daltons, or (vi) is not amolecule naturally produced by bacteria, and (c) selecting said viruseswhich bind said small molecule of interest, thereby selecting saidtarget proteins which bind said small molecule of interest.
 38. Themethod of claim 37, wherein said virus is a bacteriophage or adenovirus.39. A method for selecting a target protein which binds a small moleculeof interest, said method comprising: (a) expressing in a population ofcells or an in vitro sample a library of target proteins, wherein eachtarget protein is covalently linked to a nucleic acid encoding saidtarget protein; (b) contacting said cells or in vitro sample with asmall molecule of interest having a moiety other than an amino acid orhaving a molecular weight less than 4000 daltons; and (c) selecting saidtarget proteins which bind said small molecule of interest.
 40. Themethod of claim 39, further comprising identifying said selected targetprotein.
 41. The method of claim 39, wherein at least 100 human targetproteins are contacted with said small molecule of interest.
 42. Themethod of claim 39, wherein said small molecule of interest is anon-naturally occurring molecule.
 43. A method for selecting a candidatecompound that binds or modulates the activity of a target molecule priorto validation of said target molecule as a drug target, said methodcomprising: (a) contacting a cell or an in vitro sample comprising atarget molecule that has not been previously validated as a drug targetwith a library of candidate compounds under conditions that allow one ormore said candidate compounds to bind or modulate the activity of saidtarget molecule; and (b) selecting a candidate compound which binds ormodulates the activity of said target molecule.
 44. The method of claim43, wherein said library comprises at least five candidate compounds.45. The method of claim 43, further comprises the step of (c) measuringthe effect of said selected candidate compound in a biological assay,thereby determining the biological function of said target molecule. 46.A method for selecting candidate compounds that bind or modulate theactivity of target molecules, said method comprising: (a) contacting acell or an in vitro sample comprising a first target molecule and asecond target molecule with a library of candidate compounds underconditions that allow one or more said candidate compound to bind ormodulate the activity of said first target molecule and allow one ormore said candidate compound to bind or modulate the activity of saidsecond target molecule; (b) selecting a candidate compound which bindsor modulates the activity of said first target molecule; and (c)selecting a candidate compound which binds or modulates the activity ofsaid second target molecule.
 47. The method of claim 46, wherein saidcell or in vitro sample comprises at least five target molecules, andwherein, for each of said target molecules, a candidate compound isselected that binds or modulates the activity of said target molecule.48. An electronic database comprising at least 10 records of targetmolecules correlated to records of ligands and their ability to bind ormodulate the activity of said target molecules.
 49. The database ofclaim 48, comprising records for at least 0.5% of the proteins in theproteome of an organism.
 50. An electronic database comprising at least10 records of target molecule domains correlated to records of ligandsand their ability to bind said domains.
 51. An electronic databasecomprising a plurality of records of target molecules that have not beenpreviously validated as drug targets correlated to records of ligandsand their ability to bind or modulate the activity of said targetmolecules.
 52. A computer comprising the database of claim 48, 50, or51, and a user interface (i) capable of displaying one or more ligandsthat bind or modulate the activity of a target molecule whose record isstored in said computer or (ii) capable of displaying one or more targetmolecules that bind or have an activity that is modulated by a ligandwhose record is stored in said computer.
 53. An electronic databasecomprising at least 1000 records of compounds correlated to records of aphenotype in one or more biological assays effected by said compounds;wherein said biological assay involves a cell or in vitro sample thatdoes not contain an exogenous copy of a nucleic acid encoding a proteinthat binds said compound.
 54. A computer comprising the database ofclaim 53 and a user interface (i) capable of displaying one or morephenotypes in one or more biological assays for a compound whose recordis stored in said computer or (ii) capable of displaying one or morecompounds that effects a phenotype whose record is stored in saidcomputer.
 55. An electronic database comprising at least 10 records oftarget molecules correlated to records of an expression profile oractivity of said target molecules.
 56. An electronic database comprisinga plurality of records of target molecules that have not been previouslyvalidated as drug targets correlated to records of an expression profileor activity of said target molecules.
 57. A computer comprising thedatabase of claim 55 or 56 and a user interface (i) capable ofdisplaying one or more expression profiles or activities of a targetmolecule whose record is stored in said computer or (ii) capable ofdisplaying one or more target molecules that have an expression profileor activity whose record is stored in said computer.
 58. A method ofidentifying a target molecule associated with a phenotype of interest,said method comprising: (a) providing a first electronic databasecomprising a plurality of records of phenotypes in a biological assaycorrelated to records of the ligands and their ability to contribute tosaid phenotypes; (b) receiving a selection of a phenotype of interest;(c) identifying one or more ligands in said first database which causesaid phenotype of interest; (d) providing a second electronic databasecomprising a plurality of records of ligands correlated to records ofthe target molecules which bind said ligands or have an activity that ismodulated by said ligands; and (e) identifying one or more targetmolecules in said second database that bind or are modulated by saidligand(s) which cause said phenotype of interest, thereby identifyingone or more target molecules associated with said phenotype of interest.59. The method of claim 58, wherein said phenotype of interest isassociated with a disease state, and said target molecule is determinedto promote or inhibit said disease state.
 60. The method of claim 58wherein said method is computer implemented.
 61. A method of identifyinga phenotype that is associated with a target molecule of interest, saidmethod comprising: (a) providing a first electronic database comprisinga plurality of records of target molecules correlated to records of theligands and their ability to bind or modulate the activity of saidtarget molecules; (b) receiving a selection of a target molecule ofinterest; (c) identifying one or more ligands in said first databasewhich bind or modulate the activity of said target molecule of interest;(d) providing a second electronic database comprising a plurality ofrecords of ligands correlated to records of phenotypes in a biologicalassay caused by said ligands; and (e) identifying one or more phenotypesin said second database caused by said ligand(s), thereby identifyingone or more phenotypes associated with said target molecule of interest.62. The method of claim 61, wherein said method is computer implemented.63. A method of identifying a ligand that binds or modulates theactivity of a target molecule of interest, said method comprising: (a)providing an electronic database comprising at least 10 records oftarget molecules correlated to records of the ligands and their abilityto bind or modulate the activity of said target molecules; (b) receivinga selection of a target molecule of interest; and (c) identifying one ormore ligands in said database which bind or modulate the activity ofsaid target molecule of interest.
 64. The method of claim 63, whereinsaid ligand is used in drug discovery or development or leadoptimization.
 65. The method of claim 63, wherein said ligand is used inthe development of an agricultural or environmental agent.
 66. Themethod of claim 63, wherein said method is computer implemented.
 67. Themethod of claim 63, further comprising comparing the chemical structuresof two or more ligands which bind or modulate the activity of saidtarget molecule of interest, thereby identifying functional groups insaid ligands which promote the binding or modulation of said targetmolecule of interest.
 68. The method of claim 63, further comprisingcomparing the chemical structures of two or more ligands which bind ormodulate the activity of said target molecule of interest, therebydetermining the frequency of one or more functional groups or scaffoldsin the collection of said ligands.
 69. The method of claim 63, furthercomprising generating one or more compounds that have one or morefunctional groups that are present in two or more of said ligands;wherein said compound is used in drug discovery or development or leadoptimization.
 70. A method of identifying a target molecule that bindsor has an activity that is modulated by a ligand of interest, saidmethod comprising: (a) providing an electronic database comprising atleast 10 records of ligands correlated to records of the targetmolecules which bind or have an activity that is modulated by saidligands; (b) receiving a selection of a ligand of interest; and (c)identifying one or more target molecules in said database which bind orhave an activity that is modulated by said ligand of interest.
 71. Themethod of claim 70, wherein said method is computer implemented.
 72. Amethod for determining the selectivity of a ligand of interest, saidmethod comprising: (a) providing an electronic database comprising atleast 10 records of target molecules correlated to records of theligands and their ability to bind or modulate the activity of saidtarget molecules; (b) receiving a selection of a ligand of interest; and(c) determining the number of target molecules in said database thatbind or are modulated by said ligand, thereby determining theselectivity of said ligand of interest.
 73. The method of claim 72,wherein said method is computer implemented.
 74. The method of claim 72,wherein said ligand increases an activity of a target molecule, whereinsaid activity is associated with a disease state, an adverseside-effect, or toxicity and said ligand is eliminated from drugdiscovery or development or lead optimization.
 75. The method of claim72, wherein said ligand decreases an activity of a target molecule,wherein said activity is associated with a disease state, an adverseside-effect, or toxicity and said ligand is selected for drug discoveryor development or lead optimization.
 76. A method of for selecting atherapy for a subject for the treatment, stabilization, or prevention ofa disease or disorder, said method comprising: (a) providing anelectronic database comprising at least 10 records of target moleculescorrelated to records of the therapeutics and their ability to bind ormodulate the activity of said target molecules; (b) determining a targetmolecule in said subject that has a mutation associated with saiddisease or disorder; and (c) selecting a therapeutic from said databasethat binds or modulates the activity of said target molecule and therebytreats, stabilizes, or prevents said disease or disorder.
 77. The methodof claim 75, wherein said method is computer implemented.
 78. A methodof for selecting a therapy for a subject for the treatment,stabilization, or prevention of a disease or disorder, said methodcomprising: (a) providing an electronic database comprising at least 10records of target molecules correlated to records of the therapeuticsand their ability to bind or modulate the activity of said targetmolecules; (b) determining a target molecule in said subject that has amutation associated with said disease or disorder; (c) selecting atherapeutic from said database that does not bind or modulate theactivity of said target molecule.
 79. The method of claim 78, whereinsaid target molecule is a protein.
 80. The method of claim 78, whereinsaid target molecule is a nucleic acid.
 81. The method of claim 78,wherein said method is computer implemented.
 82. A method of determiningwhether a compound of interest is present in a sample, said methodcomprising: (a) providing reference mass spectra for two or morecompounds from a library of compounds; (b) providing a test massspectrum of a sample comprising one or more compounds from said library;and (c) determining whether peaks of a reference mass spectrum areincluded in said test mass spectrum, thereby determining whether thecompound that generated said reference mass spectrum is present in saidsample.
 83. The method of claim 82, wherein said reference mass spectraare sequentially or simultaneously analyzed until all of the peaks insaid test mass spectrum have been assigned to a compound.
 84. The methodof claim 82, wherein step (c) comprises a sequential determination ofwhether the peaks of one or more reference mass spectrum are included insaid test mass spectrum.
 85. The method of claim 82, wherein step (c) isrepeated until either (i) all of the peaks in said reference massspectrum are determined to be present in said test mass spectrum,thereby determining that the compound that generated said reference massspectrum is present in said sample; or (ii) a peak in said referencemass spectrum is determined to be absent in said test mass spectrum,thereby determining that the compound that generated said reference massspectrum is not present in said sample.
 86. The method of claim 82,wherein step (a) comprises determining the mass spectrum of eachcompound in said library.
 87. The method of claim 82, wherein at leastone of the peaks in said reference spectrum is an isotope peak or afragment peak.
 88. The method of claim 82, wherein at least one of thepeaks in said reference spectrum is a parent peak.
 89. The method ofclaim 82, wherein said reference mass spectrum are contained in adatabase comprising records of one or more properties of mass spectracorrelated to references of compounds that generate said mass spectra.90. The method of claim 82, wherein step (c) is computer implemented.91. A method of determining whether a compound of interest is present ina sample, said method comprising: (a) providing reference mass spectrafor two or more compounds from a library of compounds; (b) providing atest mass spectrum of a sample comprising one or more compounds fromsaid library; (c) determining whether one or more peaks of said testmass spectrum are included in a reference mass spectrum; and (d)determining whether all of the peaks in a reference mass spectrum arepresent in said test mass spectrum, wherein said reference mass spectrumis a reference mass spectrum from step (c) that contains a peak presentin said test mass spectrum, thereby determining whether the compoundthat generated said reference mass spectrum is present in said sample.92. The method of claim 91, wherein step (d) comprises a sequentialdetermination of whether the peaks of one or more reference massspectrum are included in said test mass spectrum.
 93. The method ofclaim 91, wherein step (d) comprises determining whether a peak in saidreference mass spectrum is present in test mass spectrum, wherein saiddetermination is repeated until either (i) all of the peaks in saidreference mass spectrum are determined to be present in said test massspectrum, thereby determining that the compound that generated saidreference mass spectrum is present in said sample; or (ii) a peak insaid reference mass spectrum is determined to be absent in said testmass spectrum, thereby determining that the compound that generated saidreference mass spectrum is not present in said sample.
 94. The method ofclaim 91, wherein step (a) comprises determining the mass spectrum ofeach compound in said library.
 95. The method of claim 91, wherein atleast one of the peaks in said reference spectrum is an isotope peak ora fragment peak.
 96. The method of claim 91, wherein at least one of thepeaks in said reference spectrum is a parent peak.
 97. The method ofclaim 91, wherein said reference mass spectrum are contained in adatabase comprising records of one or more properties of mass spectracorrelated to references of compounds that generate said mass spectra.98. The method of claim 97, wherein said property is selected from thegroup consisting of: the mass to charge ratio of an isotope peak, themass to charge ratio of a fragment peak; the mass to charge ratio of aparent peak, and the intensity of a peak.
 99. The method of claim 97,wherein step (c) or step (d) is computer implemented.
 100. Acomputer-readable memory having stored thereon a program for determiningwhether a compound of interest is present in a sample comprising: a)computer code that receives as input mass spectrometry data comprisingthe mass to charge ratio for one or more peaks in reference mass spectrafor two or more compounds from a library of compounds; b) computer codethat receives as input mass spectrometry data comprising the mass tocharge ratio for one or more-peaks in a test mass spectra of a samplecomprising one or more compounds from said library; and (c) computercode that determines whether peaks of a reference mass spectrum areincluded in said test mass spectrum, thereby determining whether thecompound that generated said reference mass spectrum is present in saidsample.
 101. A computer-readable memory having stored thereon a programfor determining whether a compound of interest is present in a samplecomprising: a) computer code that receives as input mass spectrometrydata comprising the mass to charge ratio for one or more peaks inreference mass spectra for two or more compounds from a library ofcompounds; b) computer code that receives as input mass spectrometrydata comprising the mass to charge ratio for one or more peaks in a testmass spectra of a sample comprising one or more compounds from saidlibrary; (c) computer code that determines whether one or more peaks ofsaid test mass spectrum are included in a reference mass spectrum; and(d) computer code that determines whether all of the peaks in areference mass spectrum are present in said test mass spectrum, therebydetermining whether the compound that generated said reference massspectrum is present in said sample.
 102. A method of producing two ormore vectors encoding proteins of interest, said method comprising: (a)robotically contacting a first nucleic acid encoding a first protein ofinterest with a first backbone nucleic acid in a first compartment in arobotic device under conditions that permit their reaction, therebyproducing a first vector encoding said first protein; and (b)robotically contacting a second nucleic acid encoding a second proteinof interest with a second backbone nucleic acid in a second compartmentin said robotic device under conditions that permit their reaction,thereby producing a second vector encoding said second protein.
 103. Themethod of claim 102, further comprising: (c) robotically contacting saidfirst vector with a first cell under conditions that allow the insertionof said first vector into said first cell; and (d) roboticallycontacting said second vector with a second cell under conditions thatallow the insertion of said second vector into said second cell. 104.The method of claim 103, wherein said first cell expresses said firstprotein and said second cell expresses said second protein.
 105. Themethod of claim 102, wherein at least 5 vectors are producedsimultaneously.
 106. A method of purifying proteins, said methodcomprising: (a) expressing a first protein in a first cell underconditions that result in the secretion of said first protein into afirst medium in a robotic device; (b) expressing a second protein in asecond cell under conditions that result in the secretion of said secondprotein into a second medium in said robotic device; (c) roboticallytransferring said first medium to a first chromatography column and saidsecond medium to a second chromatography column; and (d) purifying saidfirst protein and said second protein.
 107. The method of claim 106,wherein at least 5 proteins are purified simultaneously.
 108. A DNAmolecule comprising a promoter operably linked to a secretory or leadersequence, wherein said DNA molecule is linear and less than 3,500nucleotides in length.
 109. The DNA molecule of claim 108, wherein saidDNA molecule is less than 1,000 nucleotides in length.
 110. The DNAmolecule of claim 109, wherein said DNA molecule is less than 500nucleotides in length.
 111. The DNA molecule of claim 108, wherein saidDNA molecule is labeled with topoisomerase.
 112. A DNA moleculecomprising a promoter, wherein said DNA molecule is linear, less than3,500 nucleotides in length, and labeled with topoisomerase.
 113. TheDNA molecule of claim 112, wherein said DNA molecule is less than 1,000nucleotides in length.
 114. The DNA molecule of claim 113, wherein saidDNA molecule is less than 500 nucleotides in length.
 115. A DNA moleculecomprising a nucleic acid segment encoding a histidine affinity tag anda nucleic acid segment encoding a polyA region, wherein said DNAmolecule is linear and less than 3,500 nucleotides in length.
 116. TheDNA molecule of claim 115, wherein said DNA molecule is less than 1,000nucleotides in length.
 117. The DNA molecule of claim 116, wherein saidDNA molecule is less than 500 nucleotides in length.
 118. The DNAmolecule of claim 115, wherein said DNA molecule is labeled withtopoisomerase.
 119. A DNA molecule comprising a first promoter operablylinked to (i) a nucleic acid segment encoding a first protein ofinterest and a histidine affinity tag and (ii) a first polyA region,wherein said DNA molecule is linear.
 120. A DNA molecule of claim 119,wherein said nucleic acid segment encoding said first protein isoperably linked to a secretory or leader sequence.
 121. The DNA moleculeof claim 119, wherein said DNA molecule is less than 3,500 nucleotidesin length.
 122. The DNA molecule of claim 121, wherein said DNA moleculeis less than 1,000 nucleotides in length.
 123. The DNA molecule of claim119, wherein said DNA molecule is labeled with topoisomerase.
 124. TheDNA molecule of claim 119, further comprising a nucleic acid segmentencoding a second protein of interest operably linked to said firstpromoter.
 125. The DNA molecule of claim 119, further comprising asecond promoter operably linked to (i) a nucleic acid segment encoding asecond protein of interest and (ii) a second polyA region.
 126. A methodof producing a linear DNA molecule encoding a protein of interest, saidmethod comprising robotically contacting a DNA molecule of claim 112, alinear DNA molecule encoding a first protein of interest, and a DNAmolecule of claim 118 in a first compartment in a robotic device underconditions that permit their reaction, thereby producing a first linearDNA molecule encoding said first protein.
 127. The method of claim 126,further comprising robotically contacting a DNA molecule of claim 112, alinear DNA molecule encoding a second protein of interest, and a DNAmolecule of claim 118 in a second compartment in said robotic deviceunder conditions that permit their reaction, thereby producing a secondlinear DNA molecule encoding said second protein.
 128. The method ofclaim 127, further comprising: (c) robotically contacting said firstlinear DNA molecule with a first cell under conditions that allow theinsertion of said first linear DNA molecule into said first cell; and(d) robotically contacting said second linear DNA molecule with a secondcell under conditions that allow the insertion of said second linear DNAmolecule into said second cell.
 129. The method of claim 128, whereinsaid first cell expresses said first protein and said second cellexpresses said second protein.
 130. The method of claim 127, wherein atleast 5 linear DNA molecules are produced simultaneously.
 131. A methodof purifying a protein, said method comprising: (a) expressing a firstprotein in a first cell comprising a DNA molecule of claim 119 underconditions that result in the secretion of said first protein into, afirst medium in a robotic device; (b) robotically transferring saidfirst medium to a first chromatography column; and (c) purifying saidfirst protein.
 132. The method of claim 131, further comprising: (d)expressing a second protein in a second cell comprising a DNA moleculeof claim 119 under conditions that result in the secretion of saidsecond protein into a second medium in said robotic device; (e)robotically transferring said second medium to a second chromatographycolumn; and (f) purifying said second protein.
 133. The method of claim132, wherein at least 5 proteins are purified simultaneously.
 134. A CHOcell that is transiently transfected with a nucleic acid encoding anmRNA or protein of interest.
 135. The cell of claim 134, wherein saidnucleic acid is a linear DNA molecule.
 136. The cell of claim 134,wherein said cell is transiently or stably transfected with a nucleicacid encoding SV40 T antigen.
 137. The method of claim 23, furthercomprising: a) contacting an in vitro sample comprising said targetmolecule with one or more said products under conditions that allowcomplex formation between said target molecule and one or more saidproducts; (b) isolating said complex; (c) recovering one or more saidproducts from said complex; and (d) identifying one or more saidrecovered products.
 138. A method for selecting a candidate ligand whichbinds a target molecule, said method comprising: (a) contacting an invitro sample comprising a target molecule with a library of candidateligands under conditions that allow complex formation between saidtarget molecule and more than one candidate ligand; (b) isolating saidcomplex; (c) recovering more than one candidate ligand from saidcomplex; and (d) contacting a cell or in vitro sample comprising saidtarget molecule with a first recovered ligand and a second recoveredligand, wherein said contacting is conducted under conditions that allowsaid target molecule to bind said first recovered ligand and said secondrecovered ligand and allow said first recovered ligand to covalentlybind said second recovered ligand, thereby generating a productcomprising said first recovered ligand and said second recovered ligandthat has an affinity for said target molecule that is greater than theaffinity of said first recovered ligand or said second recovered ligandfor said target molecule.
 139. The method of claim 138, furthercomprising: (e) contacting an in vitro sample comprising said targetmolecule with one or more said products under conditions that allowcomplex formation between said target molecule and one or more saidproducts; (f) isolating said complex; (g) recovering one or more saidproducts from said complex; and (h) identifying one or more saidrecovered products.
 140. A method for selecting a candidate ligand whichbinds a target molecule, said method comprising: (a) contacting an invitro sample comprising a target molecule with a library of candidateligands under conditions that allow complex formation between saidtarget molecule and more than one candidate ligand; (b) isolating saidcomplex; (c) recovering more than one candidate ligand from saidcomplex; and (d) reacting a first recovered ligand and a secondrecovered ligand, thereby generating a product comprising said firstrecovered ligand and said second recovered ligand that has an affinityfor said target molecule that is greater than the affinity of said firstrecovered ligand or said second recovered ligand for said targetmolecule.
 141. The method of claim 140, further comprising: (e)contacting an in vitro sample comprising said target molecule with oneor more said products under conditions that allow complex formationbetween said target molecule and one or more said products; (f)isolating said complex; (g) recovering one or more said products fromsaid complex; and (h) identifying one or more said recovered products.